Skip to content

Players Extractor

Per-player state snapshots and time-series data.

gem.extractors.players

Per-tick player statistics extractor for Dota 2 replays.

Polls hero entity state at configurable tick intervals and accumulates snapshots for time-series analysis.

Reference: examples/extraction_demo.py, refs/parser/src/main/java/opendota/Parse.java

PlayerStateSnapshot dataclass

A single per-player state sample taken at one tick.

Attributes:

Name Type Description
tick int

Game tick of this sample.

player_id int

Player slot (0-9).

npc_name str

Hero NPC name, e.g. "npc_dota_hero_axe".

team int

Team number (2=Radiant, 3=Dire).

level int

Hero level (1-30).

xp int

Cumulative XP total.

gold int

Spendable gold from CDOTAPlayerController, or 0 if not read.

net_worth int

Net worth from CDOTAPlayerController, or 0 if not read.

total_earned_gold int

Cumulative gold earned (m_iTotalEarnedGold), or 0 if not read.

total_earned_xp int

Cumulative XP earned (m_iTotalEarnedXP), or 0 if not read.

lh int

Last-hit count.

dn int

Deny count.

hp int

Current hit points.

max_hp int

Maximum hit points.

mana float

Current mana.

max_mana float

Maximum mana.

x float | None

World x coordinate, or None if unavailable.

y float | None

World y coordinate, or None if unavailable.

ability_levels dict[str, int]

Ability name → level mapping for learned abilities.

Source code in src/gem/extractors/_snapshots.py
@dataclass
class PlayerStateSnapshot:
    """A single per-player state sample taken at one tick.

    Attributes:
        tick: Game tick of this sample.
        player_id: Player slot (0-9).
        npc_name: Hero NPC name, e.g. ``"npc_dota_hero_axe"``.
        team: Team number (2=Radiant, 3=Dire).
        level: Hero level (1-30).
        xp: Cumulative XP total.
        gold: Spendable gold from ``CDOTAPlayerController``, or 0 if not read.
        net_worth: Net worth from ``CDOTAPlayerController``, or 0 if not read.
        total_earned_gold: Cumulative gold earned (``m_iTotalEarnedGold``), or 0 if not read.
        total_earned_xp: Cumulative XP earned (``m_iTotalEarnedXP``), or 0 if not read.
        lh: Last-hit count.
        dn: Deny count.
        hp: Current hit points.
        max_hp: Maximum hit points.
        mana: Current mana.
        max_mana: Maximum mana.
        x: World x coordinate, or ``None`` if unavailable.
        y: World y coordinate, or ``None`` if unavailable.
        ability_levels: Ability name → level mapping for learned abilities.
    """

    tick: int
    player_id: int
    npc_name: str
    team: int
    level: int
    xp: int
    gold: int
    net_worth: int
    lh: int
    dn: int
    hp: int
    max_hp: int
    mana: float
    max_mana: float
    x: float | None
    y: float | None
    total_earned_gold: int = 0
    total_earned_xp: int = 0
    ability_levels: dict[str, int] = field(default_factory=dict)

PlayerTimeSeries dataclass

Time-series data for one player, aggregated from snapshots.

Attributes:

Name Type Description
player_id int

Player slot (0-9).

ticks list[int]

Tick values for each sample.

gold_t list[int]

Spendable gold at each sample tick.

total_earned_gold_t list[int]

Cumulative total earned gold at each sample tick.

total_earned_xp_t list[int]

Cumulative total earned XP at each sample tick.

net_worth_t list[int]

Net worth at each sample tick.

lh_t list[int]

Last-hit count at each sample tick.

dn_t list[int]

Deny count at each sample tick.

xp_t list[int]

Cumulative XP at each sample tick.

hp_t list[int]

Current hit points at each sample tick.

mana_t list[float]

Current mana at each sample tick.

x_t list[float | None]

World x coordinate at each sample tick (None if unavailable).

y_t list[float | None]

World y coordinate at each sample tick (None if unavailable).

Source code in src/gem/extractors/_snapshots.py
@dataclass
class PlayerTimeSeries:
    """Time-series data for one player, aggregated from snapshots.

    Attributes:
        player_id: Player slot (0-9).
        ticks: Tick values for each sample.
        gold_t: Spendable gold at each sample tick.
        total_earned_gold_t: Cumulative total earned gold at each sample tick.
        total_earned_xp_t: Cumulative total earned XP at each sample tick.
        net_worth_t: Net worth at each sample tick.
        lh_t: Last-hit count at each sample tick.
        dn_t: Deny count at each sample tick.
        xp_t: Cumulative XP at each sample tick.
        hp_t: Current hit points at each sample tick.
        mana_t: Current mana at each sample tick.
        x_t: World x coordinate at each sample tick (``None`` if unavailable).
        y_t: World y coordinate at each sample tick (``None`` if unavailable).
    """

    player_id: int
    ticks: list[int] = field(default_factory=list)
    gold_t: list[int] = field(default_factory=list)
    total_earned_gold_t: list[int] = field(default_factory=list)
    total_earned_xp_t: list[int] = field(default_factory=list)
    net_worth_t: list[int] = field(default_factory=list)
    lh_t: list[int] = field(default_factory=list)
    dn_t: list[int] = field(default_factory=list)
    xp_t: list[int] = field(default_factory=list)
    hp_t: list[int] = field(default_factory=list)
    mana_t: list[float] = field(default_factory=list)
    x_t: list[float | None] = field(default_factory=list)
    y_t: list[float | None] = field(default_factory=list)

PlayerExtractor

Polls hero entity state each tick and accumulates player snapshots.

Attach to a ReplayParser before calling parse():

Example

extractor = PlayerExtractor(sample_interval=30) extractor.attach(parser) parser.parse() ts = extractor.time_series(player_id=0)

Attributes:

Name Type Description
snapshots list[PlayerStateSnapshot]

All collected PlayerStateSnapshot objects in chronological order.

Source code in src/gem/extractors/players.py
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
class PlayerExtractor:
    """Polls hero entity state each tick and accumulates player snapshots.

    Attach to a ``ReplayParser`` before calling ``parse()``:

    Example:
        >>> extractor = PlayerExtractor(sample_interval=30)
        >>> extractor.attach(parser)
        >>> parser.parse()
        >>> ts = extractor.time_series(player_id=0)

    Attributes:
        snapshots: All collected ``PlayerStateSnapshot`` objects in
            chronological order.
    """

    snapshots: list[PlayerStateSnapshot]

    def __init__(self, sample_interval: int = 30, minute_snapshots: bool = True) -> None:
        """Initialise the extractor.

        Args:
            sample_interval: Minimum tick gap between successive snapshots.
                Default 30 ticks = 1 second at 30 ticks/sec, producing a
                dense per-second series suitable for smooth time-series and
                ML features. Pass a larger value (e.g. 150) for sparser
                sampling. The separate ``_min`` arrays always sample at exact
                60-second game-time boundaries regardless of this setting.
            minute_snapshots: If True, also record a snapshot at each
                game-minute boundary (every 1800 ticks from game start),
                matching OpenDota's ``gold_t`` / ``lh_t`` / ``xp_t`` sampling.
                Requires the parser to fire ``on_game_start``. Default True.
        """
        self._sample_interval = sample_interval
        self._minute_snapshots = minute_snapshots
        self._parser: ReplayParser | None = None
        self._last_sample: int = -sample_interval
        self._game_start_tick: int | None = None
        self._last_minute: int = -1  # last game-minute index sampled
        # entity index → Entity (mutable reference; entity is updated in place)
        self._heroes: dict[int, Entity] = {}
        # npc_name → Entity (for external position lookups)
        self._heroes_by_npc: dict[str, Entity] = {}
        # player_id → Entity (CDOTAPlayerController)
        self._controllers: dict[int, Entity] = {}
        # CDOTADataRadiant / CDOTADataDire entities (authoritative gold/LH/DN per team)
        self._data_radiant: Entity | None = None
        self._data_dire: Entity | None = None
        # player_id (0-9) → team slot (0-4) within CDOTADataRadiant/Dire
        # Read from CDOTA_PlayerResource.m_vecPlayerTeamData.%04d.m_iTeamSlot
        self._player_team_slot: dict[int, int] = {}
        # CDOTA_PlayerResource entity for slot lookups
        self._player_resource: Entity | None = None
        self.snapshots: list[PlayerStateSnapshot] = []
        self._minute_snaps: list[PlayerStateSnapshot] = []
        # player_id → (kills, deaths, assists) from server scoreboard at game end
        self.scoreboard: dict[int, tuple[int, int, int]] = {}
        # set of player_ids whose starting inventory has been emitted
        self._inventory_initialized: set[int] = set()
        # player_id → tick of first inventory snapshot (used to suppress
        # duplicate combat log PURCHASE events for the same window)
        self.first_snapshot_tick: dict[int, int] = {}

    def attach(self, parser: ReplayParser) -> None:
        """Register callbacks with the parser.

        Args:
            parser: The ``ReplayParser`` instance to attach to.
        """
        self._parser = parser
        parser.on_entity(self._on_entity)
        if self._minute_snapshots:
            parser.on_game_start(self._on_game_start)
        parser.on_game_end(self._on_game_end)

    def _on_game_start(self, game_start_tick: int) -> None:
        self._game_start_tick = game_start_tick
        # Align the 150-tick sampler to game start so both series share origin
        self._last_sample = game_start_tick

    def _on_game_end(self, tick: int) -> None:
        # Force a final snapshot at the exact game-end tick so lh/nw/gold
        # match OpenDota's end-of-game values (sampled at postGame boundary).
        self._sample(tick, minute=False)
        # Read authoritative kills/deaths/assists from the server scoreboard.
        # Reference: refs/parser/src/main/java/opendota/Parse.java lines 666-668
        # m_vecPlayerTeamData.%04d.m_iKills/Deaths/Assists on CDOTA_PlayerResource.
        pr = self._player_resource
        if pr is not None:
            for i in range(10):
                prefix = f"m_vecPlayerTeamData.{i:04d}"
                k = pr.get_int32(f"{prefix}.m_iKills")
                d = pr.get_int32(f"{prefix}.m_iDeaths")
                a = pr.get_int32(f"{prefix}.m_iAssists")
                if k is not None or d is not None or a is not None:
                    self.scoreboard[i] = (k or 0, d or 0, a or 0)

    def hero_pos(self, npc_name: str) -> tuple[float, float] | None:
        """Return the current world position of a hero by NPC name.

        Args:
            npc_name: NPC hero name, e.g. ``"npc_dota_hero_axe"``.

        Returns:
            ``(x, y)`` world coordinates, or ``None`` if the hero is not tracked.
        """
        entity = self._heroes_by_npc.get(npc_name.lower())
        return _pos(entity) if entity is not None else None

    def time_series(self, player_id: int) -> PlayerTimeSeries:
        """Aggregate snapshots for one player into time-series lists.

        Args:
            player_id: Player slot (0-9).

        Returns:
            A ``PlayerTimeSeries`` with parallel lists indexed by sample number.
        """
        ts = PlayerTimeSeries(player_id=player_id)
        for snap in self.snapshots:
            if snap.player_id != player_id:
                continue
            ts.ticks.append(snap.tick)
            ts.gold_t.append(snap.gold)
            ts.total_earned_gold_t.append(snap.total_earned_gold)
            ts.total_earned_xp_t.append(snap.total_earned_xp)
            ts.net_worth_t.append(snap.net_worth)
            ts.lh_t.append(snap.lh)
            ts.dn_t.append(snap.dn)
            ts.xp_t.append(snap.xp)
            ts.hp_t.append(snap.hp)
            ts.mana_t.append(snap.mana)
            ts.x_t.append(snap.x)
            ts.y_t.append(snap.y)
        return ts

    def minute_time_series(self, player_id: int) -> PlayerTimeSeries:
        """Aggregate per-minute snapshots for one player into time-series lists.

        Returns a ``PlayerTimeSeries`` sampled at each game-minute boundary
        (every 1800 ticks from game start), matching OpenDota's ``gold_t``,
        ``lh_t``, and ``xp_t`` arrays exactly.

        Only populated when ``minute_snapshots=True`` (the default) and the
        parser fires the game-start event.

        Args:
            player_id: Player slot (0-9).

        Returns:
            A ``PlayerTimeSeries`` with one entry per game minute.
        """
        # Deduplicate by game minute — keep the last snap per minute index.
        # Duplicates arise when on_game_end fires within the same minute as a
        # regular boundary sample, or when entity callbacks fire multiple times
        # at the same tick. Using a dict keyed by minute ensures one entry per
        # minute, with the latest (most accurate) value winning.
        seen: dict[int, PlayerStateSnapshot] = {}  # minute_index → snap
        if self._game_start_tick is not None:
            for snap in self._minute_snaps:
                if snap.player_id != player_id:
                    continue
                minute = (snap.tick - self._game_start_tick) // 1800
                seen[minute] = snap
        else:
            for i, snap in enumerate(s for s in self._minute_snaps if s.player_id == player_id):
                seen[i] = snap

        ts = PlayerTimeSeries(player_id=player_id)
        for snap in (seen[k] for k in sorted(seen)):
            ts.ticks.append(snap.tick)
            ts.gold_t.append(snap.gold)
            ts.total_earned_gold_t.append(snap.total_earned_gold)
            ts.total_earned_xp_t.append(snap.total_earned_xp)
            ts.net_worth_t.append(snap.net_worth)
            ts.lh_t.append(snap.lh)
            ts.dn_t.append(snap.dn)
            ts.xp_t.append(snap.xp)
            ts.hp_t.append(snap.hp)
            ts.mana_t.append(snap.mana)
            ts.x_t.append(snap.x)
            ts.y_t.append(snap.y)
        return ts

    def _on_entity(self, entity: Entity, op: EntityOp) -> None:
        cls = entity.get_class_name()

        if cls.startswith(_HERO_CLASS_PREFIX):
            idx = entity.get_index()
            ending = cls[len(_HERO_CLASS_PREFIX) :]
            # Register two name forms to cover inconsistent combat log names.
            # "combatLogName":  simple lowercase ("npc_dota_hero_templarassassin")
            # "combatLogName2": insert _ before each capital ("npc_dota_hero_templar_assassin")
            # Reference: refs/parser/src/main/java/opendota/Parse.java
            npc1 = "npc_dota_hero_" + ending.lower()
            npc2 = "npc_dota_hero" + re.sub(r"([A-Z])", r"_\1", ending.replace("_", "")).lower()
            if op.has(EntityOp.DELETED):
                self._heroes.pop(idx, None)
                self._heroes_by_npc.pop(npc1, None)
                self._heroes_by_npc.pop(npc2, None)
            else:
                self._heroes[idx] = entity
                self._heroes_by_npc[npc1] = entity
                self._heroes_by_npc[npc2] = entity
                self._maybe_sample()

        elif cls == "CDOTAPlayerController":
            pid = entity.get_int32("m_nPlayerID")
            if pid is None:
                pid = entity.get_int32("m_iPlayerID")
            if pid is not None and pid >= 0:
                pid //= 2
                if op.has(EntityOp.DELETED):
                    self._controllers.pop(pid, None)
                else:
                    self._controllers[pid] = entity

        elif cls in ("CDOTADataRadiant", "CDOTA_DataRadiant"):
            self._data_radiant = None if op.has(EntityOp.DELETED) else entity

        elif cls in ("CDOTADataDire", "CDOTA_DataDire"):
            self._data_dire = None if op.has(EntityOp.DELETED) else entity

        elif cls == "CDOTA_PlayerResource":
            if op.has(EntityOp.DELETED):
                self._player_resource = None
            else:
                self._player_resource = entity
                self._refresh_team_slots()

    def _refresh_team_slots(self) -> None:
        """Read m_iTeamSlot for each player from CDOTA_PlayerResource."""
        pr = self._player_resource
        if pr is None:
            return
        for i in range(10):
            slot = pr.get_int32(f"m_vecPlayerTeamData.{i:04d}.m_iTeamSlot")
            if slot is not None and slot >= 0:
                self._player_team_slot[i] = slot

    def _maybe_sample(self) -> None:
        if self._parser is None:
            return
        tick = self._parser.tick

        # Minute-boundary sampling (OpenDota-aligned)
        minute_fired = False
        if self._minute_snapshots and self._game_start_tick is not None:
            elapsed = tick - self._game_start_tick
            if elapsed >= 0:
                current_minute = elapsed // 1800
                if current_minute > self._last_minute:
                    self._last_minute = current_minute
                    self._sample(tick, minute=True)
                    minute_fired = True

        # Regular interval sampling — skip if minute boundary just fired at same tick
        if tick - self._last_sample >= self._sample_interval:
            self._last_sample = tick
            if not minute_fired:
                self._sample(tick, minute=False)

    def _sample(self, tick: int, minute: bool = False) -> None:
        entity_names = (
            self._parser.string_tables.get_by_name("EntityNames")
            if self._parser is not None and self._parser.string_tables is not None
            else None
        )
        for entity in self._heroes.values():
            snap = _snapshot_hero(entity, tick)
            if snap is None:
                continue
            # Resolve canonical NPC name from the EntityNames string table so that
            # heroes like QueenOfPain (class "CDOTA_Unit_Hero_QueenOfPain") map to
            # "npc_dota_hero_queenofpain" rather than "npc_dota_hero_queen_of_pain".
            # The camelCase→snake_case conversion in _snapshot_hero inserts word
            # boundaries at every capital letter, which is wrong for compound names.
            if entity_names is not None:
                name_idx = entity.get_int32("m_pEntity.m_nameStringableIndex")
                if name_idx is not None and name_idx >= 0:
                    item = entity_names.items.get(name_idx)
                    if item is not None:
                        snap.npc_name = item[0]
            # Overlay spendable gold + net_worth from CDOTAPlayerController.
            # m_iGold = current cash on hand (goes up/down as player earns/spends).
            # m_iNetWorth = gold + item value (also on controller for convenience).
            ctrl = self._controllers.get(snap.player_id)
            if ctrl is not None:
                gold = ctrl.get_int32("m_iGold")
                nw = ctrl.get_int32("m_iNetWorth")
                if gold is not None:
                    snap.gold = gold
                if nw is not None:
                    snap.net_worth = nw
            # Overlay authoritative cumulative stats from CDOTA_DataRadiant/Dire.
            # These are the canonical sources for advantage curves — they differ
            # from the hero/controller fields in important ways:
            #
            #   m_iTotalEarnedGold — monotonically increasing gold earned across
            #     the whole game. Use this for radiant_gold_adv, NOT m_iGold
            #     (spendable cash) which resets when items are purchased.
            #
            #   m_iTotalEarnedXP — monotonically increasing XP earned across the
            #     whole game. Use this for radiant_xp_adv, NOT m_iCurrentXP from
            #     the hero entity which resets to 0 on each level-up.
            #
            # Reference: refs/parser/Parse.java — getEntityProperty(dataTeam,
            #   "m_vecDataTeam.%i.m_iTotalEarnedGold/XP", teamSlot)
            data_entity = self._data_radiant if snap.team == _TEAM_RADIANT else self._data_dire
            if data_entity is not None:
                # Prefer authoritative team slot; fall back to pid % 5
                team_slot = self._player_team_slot.get(snap.player_id, snap.player_id % 5)
                prefix = f"m_vecDataTeam.{team_slot:04d}"
                nw = data_entity.get_int32(f"{prefix}.m_iNetWorth")
                if nw is not None and nw > 0:
                    snap.net_worth = nw
                teg = data_entity.get_int32(f"{prefix}.m_iTotalEarnedGold")
                if teg is not None and teg > 0:
                    snap.total_earned_gold = teg
                    if snap.gold == 0:
                        snap.gold = teg
                tex = data_entity.get_int32(f"{prefix}.m_iTotalEarnedXP")
                if tex is not None and tex > 0:
                    snap.total_earned_xp = tex
                lh = data_entity.get_int32(f"{prefix}.m_iLastHitCount")
                if lh is not None and lh > 0:
                    snap.lh = lh
                dn = data_entity.get_int32(f"{prefix}.m_iDenyCount")
                if dn is not None and dn > 0:
                    snap.dn = dn
            snap.ability_levels = self._read_abilities(entity)
            if minute:
                self._minute_snaps.append(snap)
            else:
                self.snapshots.append(snap)
                self._diff_inventory(entity, snap.player_id, snap.npc_name, tick)

    def _read_abilities(self, hero: Entity) -> dict[str, int]:
        """Read current ability names and levels from a hero entity.

        Iterates ``m_hAbilities.0000``–``m_hAbilities.0031``, falling back to
        ``m_vecAbilities.*`` for older replays. Resolves each handle to an
        ability entity and reads ``m_iLevel`` and the name from the
        ``EntityNames`` string table.

        Args:
            hero: The hero entity to read from.

        Returns:
            Mapping of ability name → level for all abilities with level > 0.
        """
        if self._parser is None:
            return {}
        em = self._parser.entity_manager
        if em is None:
            return {}
        entity_names = self._parser.string_tables.get_by_name("EntityNames")
        if entity_names is None:
            return {}

        result: dict[str, int] = {}
        for slot in range(_ABILITY_SLOTS):
            handle = hero.get_uint32(f"m_hAbilities.{slot:04d}")
            if handle is None:
                handle = hero.get_uint32(f"m_vecAbilities.{slot:04d}")
            if handle is None or handle == _NULL_HANDLE:
                continue
            ability_entity = em.find_by_handle(handle)
            if ability_entity is None:
                continue
            name_idx = ability_entity.get_int32("m_pEntity.m_nameStringableIndex")
            if name_idx is None or name_idx < 0:
                continue
            item = entity_names.items.get(name_idx)
            if item is None:
                continue
            name = item[0] if isinstance(item, tuple) else str(item)
            if not name:
                continue
            level = ability_entity.get_int32("m_iLevel") or 0
            if level > 0:
                result[name] = level
        return result

    def _read_inventory(self, hero: Entity) -> dict[int, str]:
        """Read current item names from a hero entity's item slots.

        Reads ``m_hItems.0000``–``m_hItems.{_ITEM_SLOTS-1:04d}``, resolving each
        handle via the entity manager and looking up the item name from the
        ``EntityNames`` string table.

        Args:
            hero: The hero entity to read from.

        Returns:
            Mapping of slot index → item name for all occupied slots.
        """
        if self._parser is None:
            return {}
        em = self._parser.entity_manager
        if em is None:
            return {}
        entity_names = self._parser.string_tables.get_by_name("EntityNames")
        if entity_names is None:
            return {}

        result: dict[int, str] = {}
        for slot in range(_ITEM_SLOTS):
            handle = hero.get_uint32(f"m_hItems.{slot:04d}")
            if handle is None or handle == _NULL_HANDLE:
                continue
            item_entity = em.find_by_handle(handle)
            if item_entity is None:
                continue
            name_idx = item_entity.get_int32("m_pEntity.m_nameStringableIndex")
            if name_idx is None or name_idx < 0:
                continue
            # EntityNames items are stored as (key_str, value_bytes); key_str is the name
            item = entity_names.items.get(name_idx)
            if item is None:
                continue
            name = item[0] if isinstance(item, tuple) else str(item)
            if name:
                result[slot] = name
        return result

    def _diff_inventory(self, hero: Entity, player_id: int, npc_name: str, tick: int) -> None:
        """Emit synthetic PURCHASE entries for a player's starting inventory.

        Called on the first snapshot per player. Reads all occupied item slots
        and emits a ``PURCHASE`` ``CombatLogEntry`` for each, filling the gap
        before the combat log stream begins recording.

        Args:
            hero: The hero entity.
            player_id: Player slot (0-9).
            npc_name: Hero NPC name for the combat log entry.
            tick: Current game tick.
        """
        if self._parser is None or player_id in self._inventory_initialized:
            return
        current = self._read_inventory(hero)

        from gem.combatlog import CombatLogEntry

        if player_id not in self._inventory_initialized:
            # First snapshot — emit all current items as starting inventory.
            # Subsequent purchases are covered by DOTA_COMBATLOG_PURCHASE events.
            # Reference: refs/parser/Parse.java isPlayerStartingItemsWritten pattern
            self._inventory_initialized.add(player_id)
            self.first_snapshot_tick[player_id] = tick
            for item_name in current.values():
                if item_name and not item_name.startswith("item_recipe"):
                    entry = CombatLogEntry(
                        tick=tick,
                        log_type="PURCHASE",
                        target_name=npc_name,
                        value_name=item_name,
                    )
                    self._parser.combat_log._emit(entry)

__init__(sample_interval: int = 30, minute_snapshots: bool = True) -> None

Initialise the extractor.

Parameters:

Name Type Description Default
sample_interval int

Minimum tick gap between successive snapshots. Default 30 ticks = 1 second at 30 ticks/sec, producing a dense per-second series suitable for smooth time-series and ML features. Pass a larger value (e.g. 150) for sparser sampling. The separate _min arrays always sample at exact 60-second game-time boundaries regardless of this setting.

30
minute_snapshots bool

If True, also record a snapshot at each game-minute boundary (every 1800 ticks from game start), matching OpenDota's gold_t / lh_t / xp_t sampling. Requires the parser to fire on_game_start. Default True.

True
Source code in src/gem/extractors/players.py
def __init__(self, sample_interval: int = 30, minute_snapshots: bool = True) -> None:
    """Initialise the extractor.

    Args:
        sample_interval: Minimum tick gap between successive snapshots.
            Default 30 ticks = 1 second at 30 ticks/sec, producing a
            dense per-second series suitable for smooth time-series and
            ML features. Pass a larger value (e.g. 150) for sparser
            sampling. The separate ``_min`` arrays always sample at exact
            60-second game-time boundaries regardless of this setting.
        minute_snapshots: If True, also record a snapshot at each
            game-minute boundary (every 1800 ticks from game start),
            matching OpenDota's ``gold_t`` / ``lh_t`` / ``xp_t`` sampling.
            Requires the parser to fire ``on_game_start``. Default True.
    """
    self._sample_interval = sample_interval
    self._minute_snapshots = minute_snapshots
    self._parser: ReplayParser | None = None
    self._last_sample: int = -sample_interval
    self._game_start_tick: int | None = None
    self._last_minute: int = -1  # last game-minute index sampled
    # entity index → Entity (mutable reference; entity is updated in place)
    self._heroes: dict[int, Entity] = {}
    # npc_name → Entity (for external position lookups)
    self._heroes_by_npc: dict[str, Entity] = {}
    # player_id → Entity (CDOTAPlayerController)
    self._controllers: dict[int, Entity] = {}
    # CDOTADataRadiant / CDOTADataDire entities (authoritative gold/LH/DN per team)
    self._data_radiant: Entity | None = None
    self._data_dire: Entity | None = None
    # player_id (0-9) → team slot (0-4) within CDOTADataRadiant/Dire
    # Read from CDOTA_PlayerResource.m_vecPlayerTeamData.%04d.m_iTeamSlot
    self._player_team_slot: dict[int, int] = {}
    # CDOTA_PlayerResource entity for slot lookups
    self._player_resource: Entity | None = None
    self.snapshots: list[PlayerStateSnapshot] = []
    self._minute_snaps: list[PlayerStateSnapshot] = []
    # player_id → (kills, deaths, assists) from server scoreboard at game end
    self.scoreboard: dict[int, tuple[int, int, int]] = {}
    # set of player_ids whose starting inventory has been emitted
    self._inventory_initialized: set[int] = set()
    # player_id → tick of first inventory snapshot (used to suppress
    # duplicate combat log PURCHASE events for the same window)
    self.first_snapshot_tick: dict[int, int] = {}

attach(parser: ReplayParser) -> None

Register callbacks with the parser.

Parameters:

Name Type Description Default
parser ReplayParser

The ReplayParser instance to attach to.

required
Source code in src/gem/extractors/players.py
def attach(self, parser: ReplayParser) -> None:
    """Register callbacks with the parser.

    Args:
        parser: The ``ReplayParser`` instance to attach to.
    """
    self._parser = parser
    parser.on_entity(self._on_entity)
    if self._minute_snapshots:
        parser.on_game_start(self._on_game_start)
    parser.on_game_end(self._on_game_end)

hero_pos(npc_name: str) -> tuple[float, float] | None

Return the current world position of a hero by NPC name.

Parameters:

Name Type Description Default
npc_name str

NPC hero name, e.g. "npc_dota_hero_axe".

required

Returns:

Type Description
tuple[float, float] | None

(x, y) world coordinates, or None if the hero is not tracked.

Source code in src/gem/extractors/players.py
def hero_pos(self, npc_name: str) -> tuple[float, float] | None:
    """Return the current world position of a hero by NPC name.

    Args:
        npc_name: NPC hero name, e.g. ``"npc_dota_hero_axe"``.

    Returns:
        ``(x, y)`` world coordinates, or ``None`` if the hero is not tracked.
    """
    entity = self._heroes_by_npc.get(npc_name.lower())
    return _pos(entity) if entity is not None else None

time_series(player_id: int) -> PlayerTimeSeries

Aggregate snapshots for one player into time-series lists.

Parameters:

Name Type Description Default
player_id int

Player slot (0-9).

required

Returns:

Type Description
PlayerTimeSeries

A PlayerTimeSeries with parallel lists indexed by sample number.

Source code in src/gem/extractors/players.py
def time_series(self, player_id: int) -> PlayerTimeSeries:
    """Aggregate snapshots for one player into time-series lists.

    Args:
        player_id: Player slot (0-9).

    Returns:
        A ``PlayerTimeSeries`` with parallel lists indexed by sample number.
    """
    ts = PlayerTimeSeries(player_id=player_id)
    for snap in self.snapshots:
        if snap.player_id != player_id:
            continue
        ts.ticks.append(snap.tick)
        ts.gold_t.append(snap.gold)
        ts.total_earned_gold_t.append(snap.total_earned_gold)
        ts.total_earned_xp_t.append(snap.total_earned_xp)
        ts.net_worth_t.append(snap.net_worth)
        ts.lh_t.append(snap.lh)
        ts.dn_t.append(snap.dn)
        ts.xp_t.append(snap.xp)
        ts.hp_t.append(snap.hp)
        ts.mana_t.append(snap.mana)
        ts.x_t.append(snap.x)
        ts.y_t.append(snap.y)
    return ts

minute_time_series(player_id: int) -> PlayerTimeSeries

Aggregate per-minute snapshots for one player into time-series lists.

Returns a PlayerTimeSeries sampled at each game-minute boundary (every 1800 ticks from game start), matching OpenDota's gold_t, lh_t, and xp_t arrays exactly.

Only populated when minute_snapshots=True (the default) and the parser fires the game-start event.

Parameters:

Name Type Description Default
player_id int

Player slot (0-9).

required

Returns:

Type Description
PlayerTimeSeries

A PlayerTimeSeries with one entry per game minute.

Source code in src/gem/extractors/players.py
def minute_time_series(self, player_id: int) -> PlayerTimeSeries:
    """Aggregate per-minute snapshots for one player into time-series lists.

    Returns a ``PlayerTimeSeries`` sampled at each game-minute boundary
    (every 1800 ticks from game start), matching OpenDota's ``gold_t``,
    ``lh_t``, and ``xp_t`` arrays exactly.

    Only populated when ``minute_snapshots=True`` (the default) and the
    parser fires the game-start event.

    Args:
        player_id: Player slot (0-9).

    Returns:
        A ``PlayerTimeSeries`` with one entry per game minute.
    """
    # Deduplicate by game minute — keep the last snap per minute index.
    # Duplicates arise when on_game_end fires within the same minute as a
    # regular boundary sample, or when entity callbacks fire multiple times
    # at the same tick. Using a dict keyed by minute ensures one entry per
    # minute, with the latest (most accurate) value winning.
    seen: dict[int, PlayerStateSnapshot] = {}  # minute_index → snap
    if self._game_start_tick is not None:
        for snap in self._minute_snaps:
            if snap.player_id != player_id:
                continue
            minute = (snap.tick - self._game_start_tick) // 1800
            seen[minute] = snap
    else:
        for i, snap in enumerate(s for s in self._minute_snaps if s.player_id == player_id):
            seen[i] = snap

    ts = PlayerTimeSeries(player_id=player_id)
    for snap in (seen[k] for k in sorted(seen)):
        ts.ticks.append(snap.tick)
        ts.gold_t.append(snap.gold)
        ts.total_earned_gold_t.append(snap.total_earned_gold)
        ts.total_earned_xp_t.append(snap.total_earned_xp)
        ts.net_worth_t.append(snap.net_worth)
        ts.lh_t.append(snap.lh)
        ts.dn_t.append(snap.dn)
        ts.xp_t.append(snap.xp)
        ts.hp_t.append(snap.hp)
        ts.mana_t.append(snap.mana)
        ts.x_t.append(snap.x)
        ts.y_t.append(snap.y)
    return ts