Skip to main content

Runtime & Persistence

Two text-runtime layers, one pipeline. The stateless layer is for single-turn requests. The persistent layer adds thread-aware checkpointing, candidate buffering, an inactivity sweeper, and the shared session-end path (/end, timeout, shutdown, API end-session).


Two layers

Same text pipelineSame stages · same order · same safety
AgentOutputresponse + crisis + diagnostics
The difference is between turns, not within a turn

Both layers run the same runtime stages in the same order. The persistent layer saves app-owned state through the configured backend after each turn and restores reducer-backed fields on the next.

Voice uses Realtime transport with app-owned persistence

The browser owns OpenAI Realtime WebRTC audio, while the backend owns session configuration, memory bootstrap, tool execution, turn recording, and session finalization. Voice does not drive full text turns through run_turn / run_turn_stream; finalized Realtime transcripts are recorded through record_voice_turn(...), and persistent voice sessions end through the same end_session(...) summarization path as text. The 20-min inactivity sweeper remains text-session oriented. See Voice for the full picture.


How state accumulates

No get_history() on the hot path

build_initial_state() emits only the current user turn. The checkpointer restores prior turns automatically via reducers. This eliminated the O(n) transcript deserialization that ran on every persistent turn.

FieldReducerBehavior
historyoperator.addEach turn appends new entries; checkpointer accumulates
transcriptoperator.addTurn finalization appends a 1-element delta
session_progress_merge_dictsturn_count increments while preserving sibling fields
session_memory_merge_dictsSummary / active concerns / open loops merge across turns
procedural_profile_merge_dictsProcedural rules + recall toggle merge
exercise_state_merge_dictsActive exercise continuity; exercise_therapeutic_approach pins the approach used when guidance resumes
memory_control_merge_dictspending_action carries destructive deletes across turns
diagnostics_merge_dictsRuntime stages and side effects write independently; gate timings, retrieval path, and write counts merge

Non-reducer fields (crisis, crisis_audit, route, response_text, response_style, therapeutic_approach, the turn-scoped lookup fields) are overwritten fresh each turn — they describe a single turn's decisions and replies.


Thread lifecycle

session lifecycle
# Session 1 — 3 turns + end
$ opencouch --thread-id alice-s1 --user-id alice
> Hi there                            # turn 1: checkpoint created
> I've been feeling anxious           # turn 2: transcript accumulates
> Can we do a grounding exercise?     # turn 3: exercise state persists via progress reducer
> /end                                # feedback prompt → summarize → episodic arc written

# Session 2 — same user, new thread
$ opencouch --thread-id alice-s2 --user-id alice
> Hey                                 # first-turn catch-up fires: "Last session (anxiety)..."
                                    # alice's semantic facts + procedural rules visible
EventWhat happens
First turnCheckpoint created. build_initial_state() provides defaults; opencouch_active_sessions row registers the active session.
Subsequent turnsCheckpointer restores accumulated state. Only the new user turn is emitted. The session row's last_active_at updates.
/endOptional feedback prompt → record_session_feedback()end_session()summarize_session triggers service-backed episodic summarization/persistence → commit_session_memory triggers service-backed promotion of held candidates → active-session row deleted.
20-min inactivityBackground sweeper finds expired session rows and runs the same end_session() flow with the runtime's default LLM client. Held candidates and episodic arcs are still written even if the user never typed /end.
Process shutdown__aexit__ best-effort finalizes anything still open (when finalize_active_sessions_on_close=True, the default).
Resume after /endSame thread_id works. Transcript persists. Next turn starts a fresh session and a fresh candidate buffer.
IncognitoAll four backends in-memory. Nothing touches disk. Crisis log + feedback still record (ephemeral). The active-session table is skipped.

The --user-id flag

memory scoping
# Without --user-id: memory scoped to thread
$ opencouch --thread-id thread-a      # facts written to "thread-a" namespace
$ opencouch --thread-id thread-b      # can't see thread-a's facts

# With --user-id: memory scoped to user across threads
$ opencouch --thread-id s1 --user-id alice   # facts written to "alice"
$ opencouch --thread-id s2 --user-id alice   # sees alice's facts from s1

Identity and thread fallbacks

PersistentAgentRuntime does not generate thread ids. Its text turn methods require callers to pass thread_id, and the runtime carries that value as session_id inside runtime state. Defaults live at the caller boundary:

SurfaceMissing thread_idMissing user_idMemory owner
Runtime APINo runtime fallback; caller must provide oneAccepted as Noneuser_id if set, otherwise session_id (thread_id)
HTTP / WebSocket text APIRequest validation failsAccepted as Noneuser_id if set, otherwise thread_id
CLI textGenerates local-<12 uuid hex>Persistent mode falls back to the active thread id; guest mode ignores --user-iduser_id if set, otherwise active thread_id
Web UIBlank setup field generates web-<8 random base36>Blank persistent setup uses web-user; incognito clears user_iduser_id if set, otherwise thread_id
Web voiceReuses the active web thread_id from setupPersistent mode uses the active web user id; incognito clears user_iduser_id if set, otherwise active thread_id

The generated thread id is never derived from the user id. The fallback goes the other direction: when no stable user_id is supplied, memory ownership falls back to the thread/session id so each thread stays isolated by default.


WorkflowContext

Runtime dependencies injected as a frozen dataclass. Runtime stages access via runtime.context.llm_client — not dict access.

@dataclass(slots=True, frozen=True)
class WorkflowContext:
llm_client: BaseLLMClient | None # control-plane LLM (safety, routing, session finalization)
memory_store: MemoryStore # unified read/write across semantic / episodic / procedural
crisis_log_backend: CrisisLogBackend # always-on audit trail
memory_mode: MemoryMode # INCOGNITO / LOCAL / SYNCED
response_llm: BaseLLMClient | None = None # optional response-writing LLM; falls back to llm_client
embedding_provider: EmbeddingProvider | None = None # for hybrid retrieval and write-time indexing
session_memory_buffer: SessionMemoryBuffer | None = None # held candidates until session end

A convenience property control_llm returns llm_client for stages that just want "the safety / routing / memory model" without caring whether a separate response model is configured.

Why frozen?

Immutability guarantees that no stage can accidentally modify a shared dependency during a turn. The slots=True flag reduces memory overhead. Both are free correctness wins.


Active session recovery

PersistentAgentRuntime keeps an opencouch_active_sessions table in the configured active-session backend. Each row carries the thread's session_buffer (held semantic / procedural candidates), max_crisis_level, and transcript_start_index for the current session. A 20-minute inactivity sweeper auto-finalizes expired sessions, and __aexit__ best-effort finalizes anything still open on shutdown — so held candidates and session-end summarization trigger reliably even if the user never types /end.

In INCOGNITO mode the same flow runs entirely in memory: the checkpoint and active-session tracking are process-local and no durable session row is written.


Key files

FilePurpose
agent/runtime/runtime.pyPersistentAgentRuntimerun_turn, run_turn_stream, end_session, record_session_feedback, sweeper, active-session recovery
agent/runtime/turn.pybuild_initial_state, state_to_output, run_agent
agent/state.pyAll state fragments + AgentGraphInputState / AgentState / AgentGraphOutputState
agent/runtime/workflow_context.pyWorkflowContext frozen dataclass
agent/audit/crisis_log.pyCrisisLogBackend protocol + in-memory implementation
agent/audit/postgres_crisis_log.pyPrimary Postgres crisis log with retention purge
agent/audit/sqlite_crisis_log.pySQLite crisis-log fallback backend
agent/feedback/session_feedback.pySessionFeedbackBackend protocol + in-memory implementation
agent/feedback/postgres_session_feedback.pyPrimary Postgres feedback store
agent/feedback/sqlite_session_feedback.pySQLite feedback fallback backend