Architecture
Single Python backend. OpenAI Agents SDK text runtime with app-owned state and deterministic routing stages. OpenAI Realtime powers browser voice through the same app-owned memory, policy, and tool boundaries. Three memory layers backed by Postgres-first durable persistence with a legacy SQLite fallback. Shared runtime for CLI, API, web text, and voice persistence.
Turn pipeline
The crisis gate is the first runtime stage. Memory only loads on the therapeutic branch — and only after two operational gates (memory commands, factual lookups) have had a chance to short-circuit the turn. If a message triggers a crisis response, memory retrieval and all operational gates are skipped entirely.
Every I/O stage has RetryPolicy(max_attempts=2) as defense-in-depth.
Key decisions
| Decision | Choice | Why |
|---|---|---|
| Execution | OpenAI Agents SDK runner with app-owned runtime state | TextTurnGraph resolves one route plan with deterministic ordering; SDK sessions carry short-term model-visible history while OpenCouch owns product state |
| LLM | BaseLLMClient protocol | OpenAI client behind a thin protocol; WorkflowContext exposes both control LLM and optional response LLM |
| Embedding | EmbeddingProvider protocol | OpenAI text-embedding-3-large when an API key is present; null provider when no API key |
| Storage | Postgres-first durable persistence with legacy SQLite fallback | Dockerized Postgres is the recommended local/runtime path; SQLite remains for compatibility fallback and focused tests |
| Memory | MemoryStore protocol | In-memory for incognito/tests; Postgres for durable local/runtime paths; SQLite legacy fallback |
| Retrieval | Hybrid RRF (k=60) | Embedding cosine + token-recall fused via Reciprocal Rank Fusion; degrades gracefully to token-recall on embedding failure |
| Prompt sources | agent/prompts/sources/*.md files | Reviewed prompt fragments; composed at runtime via compose_sources() |
| Context | WorkflowContext frozen dataclass | Attribute access, type-safe, immutable per turn; carries session_memory_buffer for session-end candidate promotion |
| Reducers | operator.add + _merge_dicts | Transcript accumulation + parallel diagnostics + per-channel session-state merging (exercise, progress, procedural profile) |
| Audit separation | agent/audit/ package | Crisis log + session feedback live outside prompt memory; cannot be disabled by user recall toggles |
| Observability | Opik + local diagnostics | Opik for primary trace-level debugging and evaluation review; in-CLI diagnostics for per-turn visibility |
| Crisis log | Always-on | Privacy asymmetry — incognito scrubs user_id but still records, opaque (SHA-256) session id |
Persistence
Dockerized Postgres is the recommended local/runtime persistence path for
thread checkpoints, active-session state, memory, crisis log, session
feedback, and voice finalization status. Legacy SQLite files under .store/
remain available for compatibility fallback and focused migration tests.
| Store | Primary backend | Legacy fallback | What it persists |
|---|---|---|---|
| Thread checkpoints | Postgres checkpointer | threads.sqlite3 | Conversation state snapshots |
| Memory | PostgresMemoryStore | SqliteMemoryStore | Semantic facts, episodic arcs, procedural profiles |
| Crisis log | PostgresCrisisLogBackend | SqliteCrisisLogBackend | Crisis event audit trail |
| Session feedback | PostgresSessionFeedbackBackend | SqliteSessionFeedbackBackend | End-of-session thumbs ratings |
Runtime ownership
Text and voice share product services, but they do not share the same transport loop.
| Area | Text agent | Realtime voice |
|---|---|---|
| Live turn loop | OpenAITextRuntime runs one SDK turn per message through app-owned routing and specialist agents. | OpenAI Realtime owns the live speech loop; the browser returns tool outputs over the data channel and posts finalized transcripts to the backend. |
| Short-term conversation | SDK session plus reducer-backed runtime state. | Realtime session context plus final transcript recording after each exchange. |
| Safety and lookup policy | Crisis gate and turn triage run before the text specialist response. | Compact session instructions and Realtime tool schemas require crisis-resource or grounded-lookup tools when needed. |
| Tools | SDK function tools attached to the owning specialist agent or runtime branch. | Realtime function schemas that call the same backend service functions. |
| Persistence | run_turn / run_turn_stream save text state and emit streaming status events. | record_voice_turn appends finalized transcript entries; end_session finalizes persistent voice sessions. |
The practical rule: shared services live under agent/tools,
agent/memory, agent/skills, and agent/runtime; transport-specific
orchestration lives under agent/runtime/openai_text_runtime.py for
text and agent/voice/ plus api/routes/voice.py for voice.
Provider adapter layer
The runtime never calls a model provider directly. It depends on the
abstract BaseLLMClient (llm/base.py), which defines three methods:
generate_text, generate_text_stream, and generate_structured. A single
factory, create_llm_client(provider) (llm/factory.py), returns the
concrete client for the normalized provider name (OpenAILLMClient for
openai) and raises on an unsupported provider. Swapping or adding a provider
means implementing BaseLLMClient and extending the factory — no runtime or
flow code changes.
This is the control-plane LLM: crisis classification, structured outputs, and the response-LLM fallback path all go through it. It is distinct from the OpenAI Agents SDK runner that drives ordinary response generation — the execution flows use the SDK for replies and fall back to this control LLM only when an SDK turn fails for a recoverable infrastructure reason.
| File | Purpose |
|---|---|
llm/base.py | BaseLLMClient ABC — the provider-agnostic interface |
llm/factory.py | create_llm_client(provider) — provider selection |
llm/openai_client.py | OpenAILLMClient — the OpenAI implementation |
Prompt layers
Six layers composed per turn, outermost first. Click a layer to see its source.
See Prompt Assembly for the full composition logic.
Package layout
| Package | Owns |
|---|---|
agent/ | State schema, models, and runtime context |
agent/runtime/ | OpenAI Agents SDK text runtime, SDK sessions, persistence, and lifecycle orchestration |
agent/specialists/ | Triage, therapeutic, crisis, and guided-exercise specialist agent definitions |
agent/tools/ | SDK tool surfaces and fallback deltas for memory control, crisis, guided exercise, and grounded lookup |
agent/skills/ | Therapeutic response skills and guided-exercise skill catalog/state machine |
agent/memory/ | Store protocol, Postgres/SQLite backends, embeddings, retrieval, dedup, LLM-primary write policy, reconciliation, procedural profile, and service-backed memory logic |
agent/audit/ | Always-on safety records, currently crisis log backends |
agent/feedback/ | Session feedback records and in-memory, Postgres, and SQLite backends |
agent/memory/control/ | User-facing memory-control actions and operations |
agent/prompts/ | Markdown prompt sources + composition helpers + crisis prompt builders |
llm/ | BaseLLMClient protocol, factory, and OpenAI client |
opencouch_cli/ | Rich-based interactive CLI |
agent/voice/ | OpenAI Realtime voice policy, session config, tool schemas, tool execution, inferred turn metadata, and transcript finalization helpers |
api/ | FastAPI routes (chat, threads, memory, Realtime voice session/tools/turns/finalization) |
tests/ | Pytest suites for runtime stages, memory, retrieval, dispatcher, persistence |
Quick links
| Topic | Page |
|---|---|
| Agent graph | Graph |
| Tools | Tools |
| State schema | State |
| Memory | Memory |
| Crisis gate | Crisis Gate |
| Runtime | Runtime |
| Observability | Observability |
| Privacy | Privacy |