Match Assembly Layer
This page explains how gem turns extractor state into final outputs:
ParsedMatch(structured object)- per-player combat aggregates
- tabular DataFrame projections (
parse_to_dataframe)
Modules covered:
src/gem/combat_aggregator.pysrc/gem/match_builder.pysrc/gem/dataframes.py
Prerequisites:
- Bits & Bytes Primer
- Stream Layer (
stream.py) - Parser Layer (
parser.py) - SendTable Layer (
sendtable.py) - State Reconstruction Layer (
string_table.py+entities.py) - Event Normalization Layer (
game_events.py+combatlog.py) - Extractors Layer
Why this is the next layer
After extractors collect raw timelines, this layer:
- merges everything into one coherent match object
- computes derived metrics (lane stats, advantages, teamfights)
- exposes table outputs for analytics pipelines
combat_aggregator.py: per-player combat counters
_CombatAggregator consumes normalized CombatLogEntry events during parse and accumulates per-player buckets.
Core structures
| Type | Role |
|---|---|
_ParsedPlayerAgg | Mutable per-player accumulator (damage/heal/uses/reasons/logs/stuns) |
_CombatAggregator.players | player_id -> _ParsedPlayerAgg map |
Key methods
| Method | What it does |
|---|---|
_hero_to_pid | Resolve hero NPC name to player slot (0-9) |
_summon_to_pid | Resolve summon unit to owner hero slot via m_hOwnerEntity |
on_entry | Routes each combat-log type to the right per-player bucket |
_dedup_purchase_log | Removes duplicate starting-window purchase entries |
on_entry routing summary
log_type | Aggregation effect |
|---|---|
DAMAGE | update attacker damage + target damage_taken (+ type splits) |
HEAL | update attacker healing |
ABILITY / ITEM | increment usage counts |
GOLD / XP | add reason-based totals on target |
DEATH | append kills log for attacker |
PURCHASE | append purchase log (attacker/target fallback) |
PICKUP_RUNE | append rune event for player slot in entry.value |
BUYBACK | populated later in match_builder post-pass |
match_builder.py: assemble final ParsedMatch
build_parsed_match(...) is the main output assembly function.
Key constants
| Constant | Value | Meaning |
|---|---|---|
_LANE_GRID | 64 | Lane heatmap cell size (world units) |
_LANE_WINDOW | 600 * 30 = 18000 | First 10 game-minutes for lane analysis |
_STEAM_ID_BASE | 76561197960265728 | SteamID64 -> account_id offset |
Build flow (high level)
- Resolve
radiant_winfallback from ancient death if parser metadata is missing. - Create base
ParsedMatchwith extractor outputs (towers,roshans,wards,draft, etc.). - Post-pass BUYBACK entries into combat aggregates.
- For each player slot
0..9:- wire regular + minute time series from
PlayerExtractor - attach combat aggregates (damage, uses, logs, stuns)
- attach scoreboard K/D/A
- build lane heatmap and lane-role/lane-10m metrics
- wire regular + minute time series from
- Populate player names/steam ids from
CDOTA_PlayerResource. - Populate team metadata from
CDOTATeamentities. - Attach observer/sentry ward logs per player.
- Build
radiant_gold_advandradiant_xp_advarrays from minute series. - Run
detect_teamfights(...)with hero-slot/team/snapshot context. - Build
_ability_snapshotsforability_level_at_tick()lookup.
Important correctness rules in this layer
- Advantage curves must use cumulative totals (
total_earned_*), not spendable gold or level-local XP. - Lane role is derived from first 10 minutes only.
- Purchase logs are deduplicated only in the starting snapshot window.
dataframes.py: tabular projection layer
build_dataframes(match) converts ParsedMatch into analytics-friendly DataFrames.
Exported tables
| Key | Content |
|---|---|
players | per-sample player rows (tick-level) |
players_minute | minute-boundary player rows |
positions | (tick, x, y) rows per player |
combat_log | normalized combat entries |
wards | ward events |
objectives | towers/barracks/roshan/tormentor/shrine/aegis |
chat | chat messages |
match | match-level metadata |
radiant_advantage | minute-by-minute radiant gold/xp advantage |
draft | draft events |
teamfights | teamfight windows |
smoke_events | smoke activations |
courier_snapshots | courier state samples |
player_kills_log | per-player kill entries |
player_purchase_log | per-player purchases |
player_runes_log | per-player runes |
player_buyback_log | per-player buybacks |
gem.parse_to_dataframe(path) is a convenience wrapper:
parse(path)->ParsedMatchbuild_dataframes(match)->dict[str, DataFrame]
Real snapshot from fixture (truncated)
From parse_to_dataframe("tests/fixtures/8520014563.dem"):
text
table_count 17
chat (2, 4)
combat_log (108594, 16)
courier_snapshots (6701, 6)
draft (0, 0)
match (1, 6)
objectives (27, 9)
player_buyback_log (5, 17)
player_kills_log (3393, 17)
player_purchase_log (661, 17)
player_runes_log (67, 17)
players (34941, 36)
players_minute (550, 12)
positions (34941, 6)
radiant_advantage (55, 3)
smoke_events (15, 6)
teamfights (42, 10)
wards (119, 10)
...Interpretation:
- Empty
drafttable is valid for some replay paths. playersandpositionsare large because they sample over many ticks.players_minuteandradiant_advantageare compact minute-bucket tables.
Common failure modes in this layer
- Using non-cumulative fields for advantage calculations causes drift.
- Missing player-name/team resolution when
CDOTA_PlayerResourcepath changes. - Over-deduplicating purchases outside the starting window removes valid repeated buys.
- Assuming all tables are non-empty (
draft,chat, etc.).