Skip to main content

Architecture

Single Python backend. OpenAI Agents SDK text runtime with app-owned state and deterministic routing stages. OpenAI Realtime powers browser voice through the same app-owned memory, policy, and tool boundaries. Three memory layers backed by Postgres-first durable persistence with a legacy SQLite fallback. Shared runtime for CLI, API, web text, and voice persistence.


Turn pipeline

Safety-first ordering

The crisis gate is the first runtime stage. Memory only loads on the therapeutic branch — and only after two operational gates (memory commands, factual lookups) have had a chance to short-circuit the turn. If a message triggers a crisis response, memory retrieval and all operational gates are skipped entirely.

1
crisis_gateEvery message — no exceptions. LLM-only classifier with local truth-table normalization.
Command(goto=...) routes to one branch:
crisis path
2a
crisis_resource_lookupRegion-aware hotline lookup via web search grounding
3a
CrisisAgentCrisis reply with optional resource overlay
4a
crisis_logAlways-on audit trail — writes regardless of memory mode
safe-turn path
2b
turn_dispatchLLM routes safe turns to memory control, grounded lookup, or therapeutic flow
3b
memory_controlOperational memory replies for list/status/forget/recall/preference turns
3b
grounded_lookupSearch-grounded answer for explicit factual lookup turns
4b
turn_memory_contextRuntime-owned retrieval across 3 namespaces for ordinary support
5b
TherapeuticAgentResponse style selection or guided-exercise handoff
both paths converge
6
turn_finalizationAppend response to transcript via operator.add reducer. No I/O — no retry. Stream emits response_ready here.
runtime side effects after response
7
semantic extractionCandidate extraction → LLM-primary write policy → commit-now / hold / require-repetition / drop
7
procedural extractionStyle rules → immediate commit or session-end hold. Safety-conflict requests dropped.
session end/end · 20-min inactivity sweeper · shutdown · API end-session · voice disconnect
8
summarize_sessionEpisodic arc for cross-session continuity
9
commit_session_memoryPromote held semantic + procedural candidates from the active-session buffer

Every I/O stage has RetryPolicy(max_attempts=2) as defense-in-depth.


Key decisions

DecisionChoiceWhy
ExecutionOpenAI Agents SDK runner with app-owned runtime stateTextTurnGraph resolves one route plan with deterministic ordering; SDK sessions carry short-term model-visible history while OpenCouch owns product state
LLMBaseLLMClient protocolOpenAI client behind a thin protocol; WorkflowContext exposes both control LLM and optional response LLM
EmbeddingEmbeddingProvider protocolOpenAI text-embedding-3-large when an API key is present; null provider when no API key
StoragePostgres-first durable persistence with legacy SQLite fallbackDockerized Postgres is the recommended local/runtime path; SQLite remains for compatibility fallback and focused tests
MemoryMemoryStore protocolIn-memory for incognito/tests; Postgres for durable local/runtime paths; SQLite legacy fallback
RetrievalHybrid RRF (k=60)Embedding cosine + token-recall fused via Reciprocal Rank Fusion; degrades gracefully to token-recall on embedding failure
Prompt sourcesagent/prompts/sources/*.md filesReviewed prompt fragments; composed at runtime via compose_sources()
ContextWorkflowContext frozen dataclassAttribute access, type-safe, immutable per turn; carries session_memory_buffer for session-end candidate promotion
Reducersoperator.add + _merge_dictsTranscript accumulation + parallel diagnostics + per-channel session-state merging (exercise, progress, procedural profile)
Audit separationagent/audit/ packageCrisis log + session feedback live outside prompt memory; cannot be disabled by user recall toggles
ObservabilityOpik + local diagnosticsOpik for primary trace-level debugging and evaluation review; in-CLI diagnostics for per-turn visibility
Crisis logAlways-onPrivacy asymmetry — incognito scrubs user_id but still records, opaque (SHA-256) session id

Persistence

Durable persistence backends

Dockerized Postgres is the recommended local/runtime persistence path for thread checkpoints, active-session state, memory, crisis log, session feedback, and voice finalization status. Legacy SQLite files under .store/ remain available for compatibility fallback and focused migration tests.

StorePrimary backendLegacy fallbackWhat it persists
Thread checkpointsPostgres checkpointerthreads.sqlite3Conversation state snapshots
MemoryPostgresMemoryStoreSqliteMemoryStoreSemantic facts, episodic arcs, procedural profiles
Crisis logPostgresCrisisLogBackendSqliteCrisisLogBackendCrisis event audit trail
Session feedbackPostgresSessionFeedbackBackendSqliteSessionFeedbackBackendEnd-of-session thumbs ratings

Runtime ownership

Text and voice share product services, but they do not share the same transport loop.

AreaText agentRealtime voice
Live turn loopOpenAITextRuntime runs one SDK turn per message through app-owned routing and specialist agents.OpenAI Realtime owns the live speech loop; the browser returns tool outputs over the data channel and posts finalized transcripts to the backend.
Short-term conversationSDK session plus reducer-backed runtime state.Realtime session context plus final transcript recording after each exchange.
Safety and lookup policyCrisis gate and turn triage run before the text specialist response.Compact session instructions and Realtime tool schemas require crisis-resource or grounded-lookup tools when needed.
ToolsSDK function tools attached to the owning specialist agent or runtime branch.Realtime function schemas that call the same backend service functions.
Persistencerun_turn / run_turn_stream save text state and emit streaming status events.record_voice_turn appends finalized transcript entries; end_session finalizes persistent voice sessions.

The practical rule: shared services live under agent/tools, agent/memory, agent/skills, and agent/runtime; transport-specific orchestration lives under agent/runtime/openai_text_runtime.py for text and agent/voice/ plus api/routes/voice.py for voice.


Provider adapter layer

The runtime never calls a model provider directly. It depends on the abstract BaseLLMClient (llm/base.py), which defines three methods: generate_text, generate_text_stream, and generate_structured. A single factory, create_llm_client(provider) (llm/factory.py), returns the concrete client for the normalized provider name (OpenAILLMClient for openai) and raises on an unsupported provider. Swapping or adding a provider means implementing BaseLLMClient and extending the factory — no runtime or flow code changes.

This is the control-plane LLM: crisis classification, structured outputs, and the response-LLM fallback path all go through it. It is distinct from the OpenAI Agents SDK runner that drives ordinary response generation — the execution flows use the SDK for replies and fall back to this control LLM only when an SDK turn fails for a recoverable infrastructure reason.

FilePurpose
llm/base.pyBaseLLMClient ABC — the provider-agnostic interface
llm/factory.pycreate_llm_client(provider) — provider selection
llm/openai_client.pyOpenAILLMClient — the OpenAI implementation

Prompt layers

Six layers composed per turn, outermost first. Click a layer to see its source.

See Prompt Assembly for the full composition logic.


Package layout

PackageOwns
agent/State schema, models, and runtime context
agent/runtime/OpenAI Agents SDK text runtime, SDK sessions, persistence, and lifecycle orchestration
agent/specialists/Triage, therapeutic, crisis, and guided-exercise specialist agent definitions
agent/tools/SDK tool surfaces and fallback deltas for memory control, crisis, guided exercise, and grounded lookup
agent/skills/Therapeutic response skills and guided-exercise skill catalog/state machine
agent/memory/Store protocol, Postgres/SQLite backends, embeddings, retrieval, dedup, LLM-primary write policy, reconciliation, procedural profile, and service-backed memory logic
agent/audit/Always-on safety records, currently crisis log backends
agent/feedback/Session feedback records and in-memory, Postgres, and SQLite backends
agent/memory/control/User-facing memory-control actions and operations
agent/prompts/Markdown prompt sources + composition helpers + crisis prompt builders
llm/BaseLLMClient protocol, factory, and OpenAI client
opencouch_cli/Rich-based interactive CLI
agent/voice/OpenAI Realtime voice policy, session config, tool schemas, tool execution, inferred turn metadata, and transcript finalization helpers
api/FastAPI routes (chat, threads, memory, Realtime voice session/tools/turns/finalization)
tests/Pytest suites for runtime stages, memory, retrieval, dispatcher, persistence

TopicPage
Agent graphGraph
ToolsTools
State schemaState
MemoryMemory
Crisis gateCrisis Gate
RuntimeRuntime
ObservabilityObservability
PrivacyPrivacy