Runtime & Persistence

Two text-runtime layers, one pipeline. The stateless layer is for single-turn requests. The persistent layer adds thread-aware checkpointing, candidate buffering, an inactivity sweeper, and the shared session-end path (/end, timeout, shutdown, API end-session).

Two layers

Same text pipelineSame stages · same order · same safety

AgentOutputresponse + crisis + diagnostics

The difference is between turns, not within a turn

Both layers run the same runtime stages in the same order. The persistent layer saves app-owned state through the configured backend after each turn and restores reducer-backed fields on the next.

Voice uses Realtime transport with app-owned persistence

The browser owns OpenAI Realtime WebRTC audio, while the backend owns session configuration, memory bootstrap, tool execution, turn recording, and session finalization. Voice does not drive full text turns through run_turn / run_turn_stream; finalized Realtime transcripts are recorded through record_voice_turn(...), and persistent voice sessions end through the same end_session(...) summarization path as text. The 20-min inactivity sweeper remains text-session oriented. See Voice for the full picture.

How state accumulates

No get_history() on the hot path

build_initial_state() emits only the current user turn. The checkpointer restores prior turns automatically via reducers. This eliminated the O(n) transcript deserialization that ran on every persistent turn.

Field	Reducer	Behavior
`history`	`operator.add`	Each turn appends new entries; checkpointer accumulates
`transcript`	`operator.add`	Turn finalization appends a 1-element delta
`session_progress`	`_merge_dicts`	`turn_count` increments while preserving sibling fields
`session_memory`	`_merge_dicts`	Summary / active concerns / open loops merge across turns
`procedural_profile`	`_merge_dicts`	Procedural rules + recall toggle merge
`exercise_state`	`_merge_dicts`	Active exercise continuity; `exercise_therapeutic_approach` pins the approach used when guidance resumes
`memory_control`	`_merge_dicts`	`pending_action` carries destructive deletes across turns
`diagnostics`	`_merge_dicts`	Runtime stages and side effects write independently; gate timings, retrieval path, and write counts merge

Non-reducer fields (crisis, crisis_audit, route, response_text, response_style, therapeutic_approach, the turn-scoped lookup fields) are overwritten fresh each turn — they describe a single turn's decisions and replies.

Thread lifecycle

session lifecycle

# Session 1 — 3 turns + end
$ opencouch --thread-id alice-s1 --user-id alice
> Hi there                            # turn 1: checkpoint created
> I've been feeling anxious           # turn 2: transcript accumulates
> Can we do a grounding exercise?     # turn 3: exercise state persists via progress reducer
> /end                                # feedback prompt → summarize → episodic arc written

# Session 2 — same user, new thread
$ opencouch --thread-id alice-s2 --user-id alice
> Hey                                 # first-turn catch-up fires: "Last session (anxiety)..."
                                    # alice's semantic facts + procedural rules visible

Event	What happens
First turn	Checkpoint created. `build_initial_state()` provides defaults; `opencouch_active_sessions` row registers the active session.
Subsequent turns	Checkpointer restores accumulated state. Only the new user turn is emitted. The session row's `last_active_at` updates.
`/end`	Optional feedback prompt → `record_session_feedback()` → `end_session()` → `summarize_session` triggers service-backed episodic summarization/persistence → `commit_session_memory` triggers service-backed promotion of held candidates → active-session row deleted.
20-min inactivity	Background sweeper finds expired session rows and runs the same `end_session()` flow with the runtime's default LLM client. Held candidates and episodic arcs are still written even if the user never typed `/end`.
Process shutdown	`__aexit__` best-effort finalizes anything still open (when `finalize_active_sessions_on_close=True`, the default).
Resume after `/end`	Same thread_id works. Transcript persists. Next turn starts a fresh session and a fresh candidate buffer.
Incognito	All four backends in-memory. Nothing touches disk. Crisis log + feedback still record (ephemeral). The active-session table is skipped.

The `--user-id` flag

memory scoping

# Without --user-id: memory scoped to thread
$ opencouch --thread-id thread-a      # facts written to "thread-a" namespace
$ opencouch --thread-id thread-b      # can't see thread-a's facts

# With --user-id: memory scoped to user across threads
$ opencouch --thread-id s1 --user-id alice   # facts written to "alice"
$ opencouch --thread-id s2 --user-id alice   # sees alice's facts from s1

Identity and thread fallbacks

PersistentAgentRuntime does not generate thread ids. Its text turn methods require callers to pass thread_id, and the runtime carries that value as session_id inside runtime state. Defaults live at the caller boundary:

Surface	Missing `thread_id`	Missing `user_id`	Memory owner
Runtime API	No runtime fallback; caller must provide one	Accepted as `None`	`user_id` if set, otherwise `session_id` (`thread_id`)
HTTP / WebSocket text API	Request validation fails	Accepted as `None`	`user_id` if set, otherwise `thread_id`
CLI text	Generates `local-<12 uuid hex>`	Persistent mode falls back to the active thread id; guest mode ignores `--user-id`	`user_id` if set, otherwise active `thread_id`
Web UI	Blank setup field generates `web-<8 random base36>`	Blank persistent setup uses `web-user`; incognito clears `user_id`	`user_id` if set, otherwise `thread_id`
Web voice	Reuses the active web `thread_id` from setup	Persistent mode uses the active web user id; incognito clears `user_id`	`user_id` if set, otherwise active `thread_id`

The generated thread id is never derived from the user id. The fallback goes the other direction: when no stable user_id is supplied, memory ownership falls back to the thread/session id so each thread stays isolated by default.

WorkflowContext

Runtime dependencies injected as a frozen dataclass. Runtime stages access via runtime.context.llm_client — not dict access.

@dataclass(slots=True, frozen=True)
class WorkflowContext:
    llm_client: BaseLLMClient | None          # control-plane LLM (safety, routing, session finalization)
    memory_store: MemoryStore                  # unified read/write across semantic / episodic / procedural
    crisis_log_backend: CrisisLogBackend       # always-on audit trail
    memory_mode: MemoryMode                    # INCOGNITO / LOCAL / SYNCED
    response_llm: BaseLLMClient | None = None  # optional response-writing LLM; falls back to llm_client
    embedding_provider: EmbeddingProvider | None = None  # for hybrid retrieval and write-time indexing
    session_memory_buffer: SessionMemoryBuffer | None = None  # held candidates until session end

A convenience property control_llm returns llm_client for stages that just want "the safety / routing / memory model" without caring whether a separate response model is configured.

Why frozen?

Immutability guarantees that no stage can accidentally modify a shared dependency during a turn. The slots=True flag reduces memory overhead. Both are free correctness wins.

Active session recovery

PersistentAgentRuntime keeps an opencouch_active_sessions table in the configured active-session backend. Each row carries the thread's session_buffer (held semantic / procedural candidates), max_crisis_level, and transcript_start_index for the current session. A 20-minute inactivity sweeper auto-finalizes expired sessions, and __aexit__ best-effort finalizes anything still open on shutdown — so held candidates and session-end summarization trigger reliably even if the user never types /end.

In INCOGNITO mode the same flow runs entirely in memory: the checkpoint and active-session tracking are process-local and no durable session row is written.

Key files

File	Purpose
`agent/runtime/runtime.py`	`PersistentAgentRuntime` — `run_turn`, `run_turn_stream`, `end_session`, `record_session_feedback`, sweeper, active-session recovery
`agent/runtime/turn.py`	`build_initial_state`, `state_to_output`, `run_agent`
`agent/state.py`	All state fragments + `AgentGraphInputState` / `AgentState` / `AgentGraphOutputState`
`agent/runtime/workflow_context.py`	`WorkflowContext` frozen dataclass
`agent/audit/crisis_log.py`	`CrisisLogBackend` protocol + in-memory implementation
`agent/audit/postgres_crisis_log.py`	Primary Postgres crisis log with retention purge
`agent/audit/sqlite_crisis_log.py`	SQLite crisis-log fallback backend
`agent/feedback/session_feedback.py`	`SessionFeedbackBackend` protocol + in-memory implementation
`agent/feedback/postgres_session_feedback.py`	Primary Postgres feedback store
`agent/feedback/sqlite_session_feedback.py`	SQLite feedback fallback backend

Two layers​

How state accumulates​

Thread lifecycle​

The --user-id flag​

Identity and thread fallbacks​

WorkflowContext​

Active session recovery​

Key files​