Replay Edge Cases

Real replay parsing is not just "read bytes, decode protobufs, done". Source 2 replays contain duplicated entities, build-specific field changes, truncated packets, and signals that look authoritative but are only heuristics.

This page collects the cases that most often confuse parser users and library contributors.

Why this page exists

If a parsed output looks strange, the first question should be:

is the replay data itself ambiguous?
did we choose the wrong source of truth?
are we inferring something that the replay never directly stores?

Those are different problems, and they need different fixes.

::: important Many "parser bugs" are really "picked the wrong authoritative field or entity" bugs. :::

Canonical hero entity vs duplicate hero entities

The clean mental model is:

one player
one hero
one movement trail

Real replay state is messier than that.

You can have multiple live hero-class entities that all look like the same hero:

the real hero entity
illusion-like entities
clone/replicate-style entities
temporary duplicated hero representations that still expose hero fields

If you sample every CDOTA_Unit_Hero_* entity and then group by player_id, you can merge multiple physical entities into one trail.

Common symptom:

the same player has two coordinates at the same tick
rendered movement suddenly zig-zags between two distant points
farming or movement maps look impossible

The correct source of truth

For movement and live hero state, the canonical choice is the player's selected hero handle:

CDOTAPlayerController.m_hAssignedHero
fallback: CDOTA_PlayerResource.m_vecPlayerTeamData.%04d.m_hSelectedHero

That handle resolves to one live entity. Sampling that entity avoids most illusion/clone contamination.

Simplified logic:

python

handle = controller.m_hAssignedHero or player_resource.m_hSelectedHero
hero_entity = entity_manager.find_by_handle(handle)
snapshot(hero_entity)

Why matching by `player_id` is not enough

player_id tells you ownership, not uniqueness.

Two hero-like entities can share the same player slot. That is enough to corrupt:

position_log
movement heatmaps
farming route visualizations
any nearest-position lookup done from the wrong hero entity

TIP

If a movement path looks wrong, inspect for duplicate same-tick positions before changing the renderer.

Sampling inside a tick

Even when you pick the right entity, timing still matters.

Replay packets do not arrive as a perfect "end of tick snapshot". Entity updates are processed incrementally. If sampling is triggered from arbitrary entity callbacks, you can observe:

partially updated state within the current tick
one entity already updated while another is still stale
small jitter or timing skew in sampled series

This usually matters less than the canonical-entity problem, but it still matters for:

movement trails
fast state transitions
exact per-tick reconstruction

Incomplete or truncated replays

Not every replay ends cleanly.

Common cases:

final compressed block is truncated
CDemoFileInfo metadata is missing or incomplete
some late-game state never arrives

gem intentionally falls back to live game-rule entities when possible, for example:

match id from CDOTAGamerulesProxy
winner from m_nGameWinner
late-game scoreboard values from authoritative player-resource fields

This is why "missing metadata" does not always mean "parse failed".

Build-specific field and schema differences

The replay format is not stable across Dota builds.

Typical examples:

m_nPlayerID vs m_iPlayerID
renamed or moved sendtable fields
draft/facet field layout changes
different serializer behavior across builds

That is why gem patches sendtable decoding and often reads fallback fields instead of assuming one permanent schema.

If a field suddenly becomes None on a newer replay, treat that as a schema question first.

Inference limits are not parser bugs

Some outputs are inferred because the replay does not directly store the desired concept.

Examples:

estimated ward vision impact
farming context labels like safe_home_farm or high_risk_invade
map control proxies
teamfight clustering windows

These are analytics heuristics, not raw replay facts.

So the right question is not:

"Did the parser read this field wrong?"

It is:

"Is this inference calibrated well enough for the use case?"

Practical debugging checklist

When something looks wrong, check in this order:

Are there duplicate samples for the same player and tick?
Are we sampling the canonical entity handle, or just matching by class/player id?
Is the replay truncated or missing metadata?
Did the build move or rename the field?
Is this output inferred rather than directly stored?

Replay Edge Cases ​

Why this page exists ​

Canonical hero entity vs duplicate hero entities ​

The correct source of truth ​

Why matching by player_id is not enough ​

Sampling inside a tick ​

Incomplete or truncated replays ​

Build-specific field and schema differences ​

Inference limits are not parser bugs ​

Practical debugging checklist ​

Related pages ​