Bits & Bytes Primer (Dota Replay Context)
This page is a practical prerequisite for parser deep dives.
If you are comfortable with Python but less familiar with binary formats, read this first. It explains exactly how replay bytes are laid out before stream.py and parser.py decode them.
Why this matters
A .dem replay is not JSON and not plain protobuf from byte 0.
It is a binary container:
- file header bytes,
- repeated framed records,
- protobuf payloads inside those records,
- and sometimes bit-packed payloads inside protobuf fields.
Understanding these layers makes stream.py and parser.py much easier to follow.
Bits vs bytes (quick refresh)
1 byte = 8 bits.- A byte is usually shown in hex (
0x00to0xFF). - Python
bytesis an immutable sequence of byte values.
Example:
b = b"\x50\x42\x44\x45"
print(len(b)) # 4 bytes
print(list(b)) # [80, 66, 68, 69]
print(b.hex(" ")) # 50 42 44 45The first bytes in a Source 2 Dota replay
stream.py expects this at the start:
- 8-byte magic:
PBDEMS2\x00 - 8 metadata bytes (skipped by parser)
Exact byte layout:
offset 0..7 : magic (8 bytes)
offset 8..11 : metadata uint32 #1 (little-endian)
offset 12..15 : metadata uint32 #2 (little-endian)
offset 16.. : first outer message recordHeader hex shape:
50 42 44 45 4d 53 32 00 00 00 00 00 00 00 00 00
|------ magic ---------| |------ metadata ------|What these mean:
PBDEMS2\x00is the file signature for Source 2 protobuf demo format.- The next 8 bytes are engine metadata hints (two little-endian
uint32values). - Parser logic does not need these hints for correctness because each outer message is self-framed (
command,tick,size,payload).
Why parser skips metadata:
- They are not required to decode message boundaries.
- They can be stale in truncated files.
Example from local fixtures:
ti14_finals_g3_xg_vs_falcons.demmetadata:278882831,278882714ti14_finals_g3_xg_vs_falcons_truncated.demhas the same metadata values but much smaller actual file size.
If magic mismatches, DemoStream raises immediately before parsing anything else.
Outer message framing (what stream.py reads)
After header+metadata, replay data is a repeated sequence:
command(varuint32) + tick(varuint32) + size(varuint32) + payload[size]Where:
commandincludes outer type plus compression flag bit.tickis game tick for this outer message.sizeis payload byte length.payloadis usually protobuf bytes (CDemo*envelopes).
Varuint32 in one minute
Varuint stores an integer across 1+ bytes:
- lower 7 bits of each byte carry value bits,
- high bit (
0x80) says “continue to next byte”.
Small values (0..127) are 1 byte.
Examples:
| Value | Encoded bytes (hex) |
|---|---|
7 | 07 |
64 | 40 |
127 | 7f |
128 | 80 01 |
300 | ac 02 |
A tiny synthetic replay fragment
Suppose we write one outer message:
command = 7(DEM_Packet, uncompressed)tick = 42size = 3payload = aa bb cc
Bytes after header would be:
07 2a 03 aa bb cc
| | | \-- payload (3 bytes)
| | \----- size=3
| \-------- tick=42
\----------- command=7Full file prefix would look like:
50 42 44 45 4d 53 32 00 00 00 00 00 00 00 00 00 07 2a 03 aa bb ccCompression flag in command
In Source 2 demo commands, 0x40 means “payload is Snappy-compressed”.
Example:
- base command
7(DEM_Packet) ->0x07 - compressed command ->
0x47(0x40 | 0x07)
stream.py does:
compressed = bool(command & 0x40)msg_type = command & ~0x40
So downstream sees msg_type == 7 whether compressed or not.
How this connects to Dota replay parsing
At outer layer:
stream.pyyields(tick, msg_type, payload)from framed bytes.parser.pymapsmsg_typeto outer protobuf envelope (CDemoPacket,CDemoFullPacket, etc.).- If envelope is packet-like,
parser.pyunpacks inner messages fromCDemoPacket.data.
Inner packet framing is another repeated structure:
type_id(ubit_var) + size(varuint32) + payloadThat is why replay parsing is “layers of framing”, not a single protobuf decode call.
Common beginner pitfalls
- Treating the whole replay as one protobuf message.
- Forgetting
commandalso carries compression flag. - Mixing decimal and hex while debugging bytes.
- Assuming every payload field is protobuf (some are bit-packed blobs).
- Ignoring ordering: some inner messages must be processed before others.