Skip to content

Match Assembly Layer

This page explains how gem turns extractor state into final outputs:

  1. ParsedMatch (structured object)
  2. per-player combat aggregates
  3. tabular DataFrame projections (parse_to_dataframe)

Modules covered:

  1. src/gem/combat_aggregator.py
  2. src/gem/match_builder.py
  3. src/gem/dataframes.py

Prerequisites:

  1. Bits & Bytes Primer
  2. Stream Layer (stream.py)
  3. Parser Layer (parser.py)
  4. SendTable Layer (sendtable.py)
  5. State Reconstruction Layer (string_table.py + entities.py)
  6. Event Normalization Layer (game_events.py + combatlog.py)
  7. Extractors Layer

Why this is the next layer

After extractors collect raw timelines, this layer:

  1. merges everything into one coherent match object
  2. computes derived metrics (lane stats, advantages, teamfights)
  3. exposes table outputs for analytics pipelines

combat_aggregator.py: per-player combat counters

_CombatAggregator consumes normalized CombatLogEntry events during parse and accumulates per-player buckets.

Core structures

TypeRole
_ParsedPlayerAggMutable per-player accumulator (damage/heal/uses/reasons/logs/stuns)
_CombatAggregator.playersplayer_id -> _ParsedPlayerAgg map

Key methods

MethodWhat it does
_hero_to_pidResolve hero NPC name to player slot (0-9)
_summon_to_pidResolve summon unit to owner hero slot via m_hOwnerEntity
on_entryRoutes each combat-log type to the right per-player bucket
_dedup_purchase_logRemoves duplicate starting-window purchase entries

on_entry routing summary

log_typeAggregation effect
DAMAGEupdate attacker damage + target damage_taken (+ type splits)
HEALupdate attacker healing
ABILITY / ITEMincrement usage counts
GOLD / XPadd reason-based totals on target
DEATHappend kills log for attacker
PURCHASEappend purchase log (attacker/target fallback)
PICKUP_RUNEappend rune event for player slot in entry.value
BUYBACKpopulated later in match_builder post-pass

match_builder.py: assemble final ParsedMatch

build_parsed_match(...) is the main output assembly function.

Key constants

ConstantValueMeaning
_LANE_GRID64Lane heatmap cell size (world units)
_LANE_WINDOW600 * 30 = 18000First 10 game-minutes for lane analysis
_STEAM_ID_BASE76561197960265728SteamID64 -> account_id offset

Build flow (high level)

  1. Resolve radiant_win fallback from ancient death if parser metadata is missing.
  2. Create base ParsedMatch with extractor outputs (towers, roshans, wards, draft, etc.).
  3. Post-pass BUYBACK entries into combat aggregates.
  4. For each player slot 0..9:
    • wire regular + minute time series from PlayerExtractor
    • attach combat aggregates (damage, uses, logs, stuns)
    • attach scoreboard K/D/A
    • build lane heatmap and lane-role/lane-10m metrics
  5. Populate player names/steam ids from CDOTA_PlayerResource.
  6. Populate team metadata from CDOTATeam entities.
  7. Attach observer/sentry ward logs per player.
  8. Build radiant_gold_adv and radiant_xp_adv arrays from minute series.
  9. Run detect_teamfights(...) with hero-slot/team/snapshot context.
  10. Build _ability_snapshots for ability_level_at_tick() lookup.

Important correctness rules in this layer

  1. Advantage curves must use cumulative totals (total_earned_*), not spendable gold or level-local XP.
  2. Lane role is derived from first 10 minutes only.
  3. Purchase logs are deduplicated only in the starting snapshot window.

dataframes.py: tabular projection layer

build_dataframes(match) converts ParsedMatch into analytics-friendly DataFrames.

Exported tables

KeyContent
playersper-sample player rows (tick-level)
players_minuteminute-boundary player rows
positions(tick, x, y) rows per player
combat_lognormalized combat entries
wardsward events
objectivestowers/barracks/roshan/tormentor/shrine/aegis
chatchat messages
matchmatch-level metadata
radiant_advantageminute-by-minute radiant gold/xp advantage
draftdraft events
teamfightsteamfight windows
smoke_eventssmoke activations
courier_snapshotscourier state samples
player_kills_logper-player kill entries
player_purchase_logper-player purchases
player_runes_logper-player runes
player_buyback_logper-player buybacks

gem.parse_to_dataframe(path) is a convenience wrapper:

  1. parse(path) -> ParsedMatch
  2. build_dataframes(match) -> dict[str, DataFrame]

Real snapshot from fixture (truncated)

From parse_to_dataframe("tests/fixtures/8520014563.dem"):

text
table_count 17
chat (2, 4)
combat_log (108594, 16)
courier_snapshots (6701, 6)
draft (0, 0)
match (1, 6)
objectives (27, 9)
player_buyback_log (5, 17)
player_kills_log (3393, 17)
player_purchase_log (661, 17)
player_runes_log (67, 17)
players (34941, 36)
players_minute (550, 12)
positions (34941, 6)
radiant_advantage (55, 3)
smoke_events (15, 6)
teamfights (42, 10)
wards (119, 10)
...

Interpretation:

  1. Empty draft table is valid for some replay paths.
  2. players and positions are large because they sample over many ticks.
  3. players_minute and radiant_advantage are compact minute-bucket tables.

Common failure modes in this layer

  1. Using non-cumulative fields for advantage calculations causes drift.
  2. Missing player-name/team resolution when CDOTA_PlayerResource path changes.
  3. Over-deduplicating purchases outside the starting window removes valid repeated buys.
  4. Assuming all tables are non-empty (draft, chat, etc.).

Next pages

  1. Time-Series and DataFrames
  2. Full Match Data
  3. Annotated JSON Output