Match Assembly Layer

This page explains how gem turns extractor state into final outputs:

ParsedMatch (structured object)
per-player combat aggregates
tabular DataFrame projections (parse_to_dataframe)

Modules covered:

src/gem/combat_aggregator.py
src/gem/match_builder.py
src/gem/dataframes.py

Prerequisites:

Why this is the next layer

After extractors collect raw timelines, this layer:

merges everything into one coherent match object
computes derived metrics (lane stats, advantages, teamfights)
exposes table outputs for analytics pipelines

`combat_aggregator.py`: per-player combat counters

_CombatAggregator consumes normalized CombatLogEntry events during parse and accumulates per-player buckets.

Core structures

Type	Role
`_ParsedPlayerAgg`	Mutable per-player accumulator (damage/heal/uses/reasons/logs/stuns)
`_CombatAggregator.players`	`player_id -> _ParsedPlayerAgg` map

Key methods

Method	What it does
`_hero_to_pid`	Resolve hero NPC name to player slot (0-9)
`_summon_to_pid`	Resolve summon unit to owner hero slot via `m_hOwnerEntity`
`on_entry`	Routes each combat-log type to the right per-player bucket
`_dedup_purchase_log`	Removes duplicate starting-window purchase entries

`on_entry` routing summary

`log_type`	Aggregation effect
`DAMAGE`	update attacker damage + target damage_taken (+ type splits)
`HEAL`	update attacker healing
`ABILITY` / `ITEM`	increment usage counts
`GOLD` / `XP`	add reason-based totals on target
`DEATH`	append kills log for attacker
`PURCHASE`	append purchase log (attacker/target fallback)
`PICKUP_RUNE`	append rune event for player slot in `entry.value`
`BUYBACK`	populated later in `match_builder` post-pass

`match_builder.py`: assemble final `ParsedMatch`

build_parsed_match(...) is the main output assembly function.

Key constants

Constant	Value	Meaning
`_LANE_GRID`	`64`	Lane heatmap cell size (world units)
`_LANE_WINDOW`	`600 * 30 = 18000`	First 10 game-minutes for lane analysis
`_STEAM_ID_BASE`	`76561197960265728`	SteamID64 -> account_id offset

Build flow (high level)

Resolve radiant_win fallback from ancient death if parser metadata is missing.
Create base ParsedMatch with extractor outputs (towers, roshans, wards, draft, etc.).
Post-pass BUYBACK entries into combat aggregates.
For each player slot 0..9:
- wire regular + minute time series from PlayerExtractor
- attach combat aggregates (damage, uses, logs, stuns)
- attach scoreboard K/D/A
- build lane heatmap and lane-role/lane-10m metrics
Populate player names/steam ids from CDOTA_PlayerResource.
Populate team metadata from CDOTATeam entities.
Attach observer/sentry ward logs per player.
Build radiant_gold_adv and radiant_xp_adv arrays from minute series.
Run detect_teamfights(...) with hero-slot/team/snapshot context.
Build _ability_snapshots for ability_level_at_tick() lookup.

Important correctness rules in this layer

Advantage curves must use cumulative totals (total_earned_*), not spendable gold or level-local XP.
Lane role is derived from first 10 minutes only.
Purchase logs are deduplicated only in the starting snapshot window.

`dataframes.py`: tabular projection layer

build_dataframes(match) converts ParsedMatch into analytics-friendly DataFrames.

Exported tables

Key	Content
`players`	per-sample player rows (tick-level)
`players_minute`	minute-boundary player rows
`positions`	`(tick, x, y)` rows per player
`combat_log`	normalized combat entries
`wards`	ward events
`objectives`	towers/barracks/roshan/tormentor/shrine/aegis
`chat`	chat messages
`match`	match-level metadata
`radiant_advantage`	minute-by-minute radiant gold/xp advantage
`draft`	draft events
`teamfights`	teamfight windows
`smoke_events`	smoke activations
`courier_snapshots`	courier state samples
`player_kills_log`	per-player kill entries
`player_purchase_log`	per-player purchases
`player_runes_log`	per-player runes
`player_buyback_log`	per-player buybacks

gem.parse_to_dataframe(path) is a convenience wrapper:

parse(path) -> ParsedMatch
build_dataframes(match) -> dict[str, DataFrame]

Real snapshot from fixture (truncated)

From parse_to_dataframe("tests/fixtures/8520014563.dem"):

text

table_count 17
chat (2, 4)
combat_log (108594, 16)
courier_snapshots (6701, 6)
draft (0, 0)
match (1, 6)
objectives (27, 9)
player_buyback_log (5, 17)
player_kills_log (3393, 17)
player_purchase_log (661, 17)
player_runes_log (67, 17)
players (34941, 36)
players_minute (550, 12)
positions (34941, 6)
radiant_advantage (55, 3)
smoke_events (15, 6)
teamfights (42, 10)
wards (119, 10)
...

Interpretation:

Empty draft table is valid for some replay paths.
players and positions are large because they sample over many ticks.
players_minute and radiant_advantage are compact minute-bucket tables.

Common failure modes in this layer

Using non-cumulative fields for advantage calculations causes drift.
Missing player-name/team resolution when CDOTA_PlayerResource path changes.
Over-deduplicating purchases outside the starting window removes valid repeated buys.
Assuming all tables are non-empty (draft, chat, etc.).

Match Assembly Layer ​

Why this is the next layer ​

combat_aggregator.py: per-player combat counters ​

Core structures ​

Key methods ​

on_entry routing summary ​

match_builder.py: assemble final ParsedMatch ​

Key constants ​

Build flow (high level) ​

Important correctness rules in this layer ​

dataframes.py: tabular projection layer ​

Exported tables ​

Real snapshot from fixture (truncated) ​

Common failure modes in this layer ​

Next pages ​