Extractors Layer
This page explains how gem converts reconstructed game state + normalized events into analysis-ready records.
Modules covered:
src/gem/extractors/_snapshots.pysrc/gem/extractors/players.pysrc/gem/extractors/objectives.pysrc/gem/extractors/wards.pysrc/gem/extractors/courier.pysrc/gem/extractors/draft.pysrc/gem/extractors/teamfights.pysrc/gem/extractors/lane.py
Prerequisites:
- Bits & Bytes Primer
- Stream Layer (
stream.py) - Parser Layer (
parser.py) - SendTable Layer (
sendtable.py) - State Reconstruction Layer (
string_table.py+entities.py) - Event Normalization Layer (
game_events.py+combatlog.py)
Why this is the next layer
After the parser can produce stable entity updates and normalized combat/game events, extractors turn those low-level signals into match analytics objects:
- player time series
- objectives timeline
- ward placement/kill/expiry timeline
- courier state snapshots
- draft picks/bans
- teamfights and lane roles
Extractor pattern used across modules
All extractor classes follow the same shape:
attach(parser)registers callback hooks.- Callbacks consume events incrementally during parse.
- Extractor keeps mutable state in memory.
- Results are exposed as dataclass lists / time series after parse.
Core parser hooks used in this layer:
| Hook | Payload | Used by |
|---|---|---|
on_entity | (entity, op) | players, objectives, wards, courier, draft |
on_combat_log_entry | CombatLogEntry | players, objectives, wards, combat aggregation |
on_chat_event | user-message chat events | objectives |
on_game_start | game_start_tick | players (minute alignment) |
on_game_end | final tick | players (final snapshot + scoreboard) |
_snapshots.py: shared player snapshot model
This helper file defines the canonical dataclasses used by players.py.
Core constants
| Constant | Value | Meaning |
|---|---|---|
_CELL_SIZE | 128 | World units per grid cell for cell + vec coordinate encoding |
_HERO_CLASS_PREFIX | CDOTA_Unit_Hero_ | Hero entity-class prefix for filtering/resolution |
Key helpers
_pos(entity)readsCBodyComponent.m_cellX/Y+m_vecX/Yto world(x, y)._snapshot_hero(entity, tick)builds onePlayerStateSnapshotwith core hero fields.
players.py: per-player time series extractor
PlayerExtractor is the heaviest extractor. It samples hero state over time and overlays combat-log cumulative totals.
Important constants
| Constant | Value | Meaning |
|---|---|---|
_ITEM_SLOTS | 17 | m_hItems.0000-0016 (main + backpack + stash) |
_ABILITY_SLOTS | 32 | m_hAbilities.0000-0031 scan range |
_NULL_HANDLE | 0xFFFFFF | Empty handle sentinel |
_TEAM_RADIANT / _TEAM_DIRE | 2 / 3 | Team ids used in replay state |
Main methods
| Method | What it does |
|---|---|
attach | Registers on_entity, on_combat_log_entry, on_game_start, on_game_end |
_on_entity | Tracks heroes/controllers/team data entities and triggers sampling |
_maybe_sample | Enforces interval + minute-boundary sampling policy |
_sample | Builds snapshots, overlays controller/data-team stats, reads abilities, diffs inventory |
_on_combat_log_entry | Maintains running totals (damage/healing/deaths/stuns) per player |
_on_game_end | Forces final snapshot and captures authoritative K/D/A scoreboard |
time_series / minute_time_series | Materializes arrays per player from stored snapshots |
Sampling model
Two snapshot streams are produced:
- regular snapshots every
sample_intervalticks (default30= ~1 second) - minute snapshots every
1800ticks from game start (gold_t_min,xp_t_min, etc.)
Minute snapshots are deduplicated by minute index so repeated same-minute events do not create duplicate buckets.
Canonical Hero Entity Matters
Do not assume "all CDOTA_Unit_Hero_* entities for this player" means "the player's real hero". Illusion-like and clone-like entities can create duplicate same-tick positions and corrupt movement trails. gem samples the player's selected hero handle first for this reason. See Replay Edge Cases.
Special handling worth knowing
- Gold/XP source selection is intentional:
- spendable
goldfromCDOTAPlayerController.m_iGold - cumulative totals from
CDOTADataRadiant/Dire.m_vecDataTeam.*
- spendable
- Initial inventory is emitted as synthetic
PURCHASEcombat-log entries once per player. - Ability levels are resolved via ability handles +
EntityNamesstring table.
objectives.py: objective timeline extractor
ObjectivesExtractor emits timeline events for towers, barracks, Roshan, Tormentor, shrine kills, and Aegis interactions.
Key constants
| Constant | Value | Meaning |
|---|---|---|
_CHAT_MSG_AEGIS | 8 | Aegis pickup chat event |
_CHAT_MSG_AEGIS_STOLEN | 53 | Aegis stolen chat event |
_CHAT_MSG_DENIED_AEGIS | 51 | Aegis denied chat event |
_CHAT_MSG_SHRINE_KILLED | 101 | Shrine of Wisdom destroyed |
_CHAT_MSG_MINIBOSS_KILL | 117 | Tormentor kill chat event |
Main methods
| Method | What it does |
|---|---|
attach | Registers on_combat_log_entry, on_chat_event, on_entity |
_on_entity | Tracks alive Roshan drop-item entities (Aegis/Cheese/Shard/Banner) |
_on_chat_event | Emits AegisEvent / ShrineKill and patches latest tormentor killer player id |
_on_combat_log | Converts DEATH events into TowerKill, BarracksKill, RoshanKill, TormentorKill |
Roshan drops are reconstructed from currently alive Roshan item entities at kill tick.
wards.py: ward placement, expiry, and kill attribution
WardsExtractor uses m_lifeState transitions as the primary ward lifecycle signal.
Key constants
| Constant | Value | Meaning |
|---|---|---|
_WARD_CLASSES | observer + sentry classes | Tracked entity classes |
_WARD_TARGET_NAMES | observer/sentry combat-log names | Targets for killer queue |
_OBSERVER_LIFESPAN_TICKS | 720 | ~6 minutes |
_SENTRY_LIFESPAN_TICKS | 360 | ~3 minutes |
_EXPIRY_TOLERANCE_TICKS | 30 | Grace window for expiry classification |
Lifecycle logic
lifeStatetransitions to0(alive) => ward placement event.lifeStatetransitions0 -> 1(dying) => killed/expired classification.- Killer attribution comes from queued ward
DEATHcombat-log events. - If no killer appears and age is near lifespan => classify as natural expiry.
Main methods
| Method | What it does |
|---|---|
attach | Registers on_entity, on_combat_log_entry |
_on_entity | Tracks hero map and ward lifecycle transitions |
_on_ward_placed | Captures team, placer (m_hOwnerEntity), coordinates |
_on_ward_left | Resolves killed_tick vs expires_tick and killer |
_on_combat_log | Feeds killer queues and same-tick backfill |
finalize | Back-fills missing placer NPC names from player-id map |
courier.py: courier polling extractor
CourierExtractor tracks CDOTA_Unit_Courier* entities and snapshots at interval.
Main fields sampled
m_iTeamNumm_iCourierStatem_bFlyingCourier- world position via
_pos(entity)
Main methods
| Method | What it does |
|---|---|
attach | Registers on_entity |
_on_entity | Adds/removes courier entities and triggers sampling |
_maybe_sample | Interval gate (sample_interval, default 150) |
_sample | Appends CourierSnapshot records |
draft.py: pick/ban extractor
DraftExtractor polls CDOTAGamerulesProxy draft fields and resolves hero names.
Key constants
| Constant | Value | Meaning |
|---|---|---|
_BAN_SLOTS | 14 | m_BannedHeroes.0000-0013 |
_PICK_SLOTS | 10 | m_SelectedHeroes.0000-0009 |
Resolution logic
- Capture raw hero ids from gamerules proxy arrays.
- De-duplicate with
_seenkeys(is_pick, slot_index, hero_id). - Resolve hero names using live
hero_id -> npc_namemap first. - Fallback to bundled
heroes.jsonmapping (hero_id // 2preferred). finalize()re-resolves names after parse when hero entities are fully known.
Main methods
| Method | What it does |
|---|---|
attach | Registers on_entity |
_update_live_map | Reads selected hero handles from CDOTA_PlayerResource |
_check_draft | Emits DraftEvent for bans and picks |
finalize | Rewrites names using full live map for correctness |
resolve_pick_team(event, players) is a helper that prefers post-game roster team mapping over draft-time active-team field.
teamfights.py + lane.py: post-parse derived extractors
These modules are in extractors/, but run as derived computations during match assembly.
teamfights.py constants
| Constant | Value | Meaning |
|---|---|---|
_COOLDOWN_TICKS | 15 * 30 = 450 | Death-window merge cooldown |
_FIGHT_RADIUS | 3000.0 | Spatial clustering radius between deaths/fights |
detect_teamfights(...) pass structure
- Pass 1: open/merge/close fight windows from hero deaths.
- Pass 2: aggregate per-player damage/heal/death/buyback/gold/use stats inside windows.
- Pass 3: compute XP deltas via nearest player snapshots.
- Pass 4: assign winner (
radiant/dire/draw) from kill counts.
lane.py
classify_lane(lane_pos, team) maps first-10-minute position heatmap into lane roles:
- safe lane (
1) - mid (
2) - offlane (
3) - jungle (
4) - roaming (
5)
How this layer is wired into gem.parse
gem.parse(...) attaches extractors before parsing:
p = ReplayParser(path)
player_ext = PlayerExtractor()
obj_ext = ObjectivesExtractor()
ward_ext = WardsExtractor()
courier_ext = CourierExtractor()
draft_ext = DraftExtractor()
player_ext.attach(p)
obj_ext.attach(p)
ward_ext.attach(p)
courier_ext.attach(p)
draft_ext.attach(p)After parse:
draft_ext.finalize()fixes draft hero names with complete live map.ward_ext.finalize()back-fills unresolved placer names.build_parsed_match(...)consumes all extractor outputs and runs teamfight/lane derivations.
Real snapshot from fixture (truncated)
From tests/fixtures/8520014563.dem:
match_id 8520014563
ticks 6318 104180
players 10
towers 16
barracks 6
roshans 2
aegis_events 2
tormentors 1
shrines 0
wards 119
courier_snapshots 6701
draft_events 0
teamfights 42
combat_log 108594
chat 2
player 0 npc_dota_hero_muerta 2 3819 55 0 0
player 1 npc_dota_hero_queenofpain 2 3497 55 3 1
player 2 npc_dota_hero_pugna 2 3339 55 10 16
...How to read one player line:
player_idhero_nameteam- regular snapshot count (
times) - minute snapshot count (
times_min) - observer ward events for that player
- sentry ward events for that player
Common failure modes in this layer
- Missing
EntityNamestable entries causes empty/incorrect hero/item names. - Wrong player-id normalization (forgetting
/ 2) breaks attribution. - Minute-boundary logic drift causes mismatched
*_t_minarrays. - Ward same-tick ordering issues can leave temporary empty killer fields.
- Draft ids interpreted without
hero_id // 2fallback can mislabel heroes.