Agent Graph
Every message follows the same pipeline: safety check first, then memory retrieval, then routing, then response, then post-response memory evaluation. Some memory commits happen immediately; others wait for the shared session-end seam. Click any step below to see what it does.
Click any step to expand. The pipeline shows the v0.9+ safety-first topology with parallel post-response memory evaluation.
Topology
The parent graph has 8 nodes. The therapeutic subgraph (compiled as
a single parent-level node) hides 7 internal nodes (dispatcher + 6
mode nodes). The crisis gate routes between branches via
Command(goto=...). The two extractors fan out in parallel after
finalize, but they now feed deterministic write policy instead of
blindly writing every candidate to long-term memory.
Node responsibilities
Each node has its own datasheet on the Nodes page — filterable by category (safety, memory, routing, extraction, terminal) or capability (LLM, retry, reducer, parallel). Every card shows the node's inputs, outputs, skip conditions, and source file.
Therapeutic subgraph
The dispatcher picks exactly one mode per turn:
| Mode | When it fires |
|---|---|
supportive | Default — user sharing feelings, seeking support, or greeting |
reflective | User describing a recurring pattern they've already named |
clarifying | Ambiguous message — agent needs context before responding |
psychoeducation | User describes a reaction AND seeks understanding |
guided_exercise | User requests a structured technique (12 exercises) |
closing | User signals wind-down ("I should go", "thanks, this helped") |
The dispatcher uses hybrid classification: high-precision regex fast paths for obvious cases (confusion markers -> clarifying, active exercise re-entry -> guided_exercise), LLM classifier for the ambiguous middle, regex fallback when no LLM is available.
Design principles
Safety is sequential. The crisis gate runs before any memory retrieval or response generation. There is no path through the graph that loads context without first passing the safety check.
Routing is deterministic-first. Regex patterns fire before the LLM. Most turns never need an LLM classification call.
Parallel post-response memory evaluation. The two extractor nodes
fan out in parallel from finalize_turn_node. Each lane does LLM
candidate extraction plus deterministic write policy. Some candidates
commit immediately; others are buffered for session end or dropped.
Both still write to the diagnostics dict via a _merge_dicts
reducer — no manual dict spreading needed, no race conditions.
Reducer-backed accumulation. history and transcript use
operator.add reducers. Each turn emits only the new entries
(user turn at init, assistant turn at finalize); the checkpointer
accumulates prior turns automatically. No get_history() calls
on the hot path.
Structured working memory. load_memory_node returns raw
WorkingMemoryEntry dicts (semantic and episodic types). Formatting
happens on demand at prompt-build time and CLI render time — not
in the graph.
Defense-in-depth retry. All I/O nodes have
RetryPolicy(max_attempts=2). Nodes catch their own expected
exceptions internally; the retry policy covers unexpected transient
failures outside the node's error handling.
Small-talk gate saves cost. Both extractors check whether the message is obviously small talk before making the LLM call. Greetings and acknowledgments skip extraction entirely.
Per-turn diagnostics. Every node stamps timing and decision
metadata into state["diagnostics"] via the _merge_dicts reducer.
Keys include load_memory_ms, crisis_gate_ms, extract_facts_ms,
extract_procedural_ms, and turn_total_ms.
Runtime context
Dependencies are injected via WorkflowContext, a frozen dataclass
with attribute access (runtime.context.llm_client, not dict access).
Fields: llm_client, memory_store, crisis_log_backend,
memory_mode, embedding_provider.
Key files
| File | What it does |
|---|---|
agent/graph.py | build_agent_workflow(), build_initial_state(), state_to_output() |
agent/state.py | AgentState TypedDict with reducer annotations |
agent/working_memory.py | WorkingMemoryEntry types, factory functions, formatters |
agent/runtime_context.py | WorkflowContext frozen dataclass |
agent/therapeutic/graph.py | build_therapeutic_subgraph() with TherapeuticSubgraphOutput |
agent/therapeutic/dispatcher.py | Hybrid regex + LLM mode classification |
agent/therapeutic/prompts.py | Knowledge composition + on-demand working memory formatting |
agent/nodes/load_memory.py | Memory retrieval + query embedding |
agent/nodes/crisis_gate.py | Crisis classification + routing |
agent/nodes/crisis_response.py | Crisis response generation |
agent/nodes/crisis_log.py | Audit log writer |
agent/nodes/extract_facts.py | Semantic fact extraction |
agent/nodes/extract_procedural_rules.py | Procedural rule extraction |
agent/nodes/commit_session_memory.py | Session-end promotion for held semantic / procedural candidates |
agent/nodes/summarize_session.py | Session-end episodic summarization |
agent/nodes/finalize_turn.py | Transcript finalization (single-element delta) |
agent/memory/small_talk_gate.py | Pre-extractor small-talk heuristic |
agent/persistence.py | PersistentAgentRuntime — SQLite-backed thread state |
Adding a new therapeutic mode
- Create
knowledge/response_modes/your_mode.mdwith the mode's knowledge content - Create
agent/therapeutic/your_mode.pywith the response node function - Add a knowledge tuple and instructions block in
agent/therapeutic/prompts.py - Register the node in
agent/therapeutic/graph.py(subgraph.add_node(...)) - Add dispatch patterns in
agent/therapeutic/dispatcher.py(regex fast paths + LLM prompt update) - Add cases to
eval/datasets/therapeutic_routing_v0.json - Add a
_STAGE_LABELSentry inopencouch_cli/app.pyif the node name differs from the CLI's stage label vocabulary