Context Management
Every text turn, the runtime retrieves long-term memory, relies on the SDK session for short-term conversation continuity, and assembles the context that the OpenAI Agents SDK runner will see. Voice uses a compact memory bootstrap at Realtime session start plus transcript recording at turn end; see Voice Persistence.
Per-turn retrieval
The runtime turn memory context runs before ordinary therapeutic responses. Crisis, memory-control, and grounded lookup paths can skip or ignore the retrieval result. When it runs, it populates:
| Output | What it contains | Retrieval method |
|---|---|---|
working_memory | WorkingMemoryEntry dicts (semantic + episodic) | Hybrid RRF (embedding + token-recall) |
session_memory.summary | Per-turn human-readable summary of what was retrieved | Written by turn memory context |
procedural_profile.procedural_rules | Style directives from ProceduralProfile | Full profile load (not query-based) |
procedural_profile.proactive_recall_enabled | Recall toggle | Read from procedural profile |
diagnostics | Hit counts, store sizes, retrieval path | Written via _merge_dicts reducer |
On the first turn of a new session (transcript length = 1), the most recent episodic arc is automatically injected as a catch-up entry — the "last time we talked..." experience — regardless of query match.
The retrieval pipeline runs semantic and episodic searches in
parallel (asyncio.gather), then loads the procedural profile.
Episodic results are deduped by (summary, primary_themes) so the
same arc never appears twice on a single turn.
What the response node sees
Click a step to see what it does.
| Field | Source | Per turn? |
|---|---|---|
message | User input | Yes |
history / transcript | Checkpointer + operator.add reducer | Accumulated across turns |
working_memory | Turn memory context | Re-retrieved each turn |
procedural_profile | Turn memory context | Re-loaded each turn (merged via _merge_dicts) |
crisis / crisis_audit | Crisis gate | Fresh each turn |
response_style, therapeutic_approach | Dispatcher | Fresh each turn |
session_progress | Checkpointer + _merge_dicts reducer | turn_count increments while preserving sibling fields |
exercise_state | guided_exercise_node + dispatcher, _merge_dicts | exercise_therapeutic_approach stores the approach pinned at exercise start; active exercise state survives side-turns |
Exercise approach continuity
A guided exercise that started under, say, the act approach should
keep that approach when guided exercise instructions resume. The
exercise_therapeutic_approach field on exercise_state stores the
pinned approach at exercise start. The guided-exercise runtime reuses it when
an active exercise routes back to guided_exercise. A side-turn preserves the
active exercise state while the therapeutic response skill may choose a fresh
top-level approach for the explanation. Only an explicit exit clears
exercise_therapeutic_approach alongside exercise_type and
exercise_step.
This is enforced by agent/skills/guided_exercises/ and the
runtime-owned exercise state checks in agent/runtime/openai_text_runtime.py, not by a
separate post-processing selector.
Session stage
Dynamic stage inference (opening → deepening → working → closing)
is a planned enhancement. The knowledge file
(agent/prompts/sources/session_stages.md) exists, but stage inference
is not wired into runtime state or any active runtime stage.
Prompt injection by response style
Not every response style sees the same context. Click a cell to see which context fields are injected into each response prompt.
Diagnostics
Turn memory context stamps the following diagnostics so each turn's retrieval is auditable from CLI / Opik:
| Key | What it reports |
|---|---|
load_memory_ms | Total wall time including all parallel queries |
semantic_hits / semantic_store_size | Returned vs. stored semantic facts |
episodic_hits / episodic_store_size | Returned vs. stored episodic arcs |
procedural_count | Number of procedural rules loaded |
retrieval_path | hybrid_rrf / token_recall / token_recall_after_embed_error |