Skip to main content

Voice Persistence

Voice persistence is transcript-first. OpenAI Realtime generates the live speech response; OpenCouch records the finalized transcript and session metadata after the exchange is complete.

Persistent mode

In persistent voice mode:

  1. Session startup loads compact saved-memory context for the Realtime instructions.
  2. Final transcript pairs are written with runtime.voice.record_voice_turn(...) (the VoiceRuntimeFacade).
  3. Tool calls observed during the turn are recorded in diagnostics and can update app-owned state when the tool has a side effect.
  4. Disconnect calls runtime.end_session(...).
  5. The session finalizer may write an episodic arc and promote held memory candidates through the same services used by text.

Voice does not currently collect an explicit thumbs feedback value before ending. The session still finalizes memory on disconnect.

Incognito mode

In incognito voice mode:

  • Startup does not inject durable saved-memory context.
  • Persistent-only memory tools are unavailable.
  • Finalized turns return recorded=false and are not written to durable thread history.
  • Disconnect returns finalized=false and skips durable session summarization.
  • Crisis and session feedback backends remain process-local and ephemeral when the runtime is configured for incognito.

Identity

Voice reuses the active web setup:

ValueSource
thread_idActive web thread id.
user_idActive web user id in persistent mode; omitted in incognito.
Memory owneruser_id when present, otherwise thread_id.
Assistant voiceSetup/store-selected Realtime voice name.

The generated thread id is never derived from the user id. When no stable user id is supplied, the thread id isolates memory by session.

State written by record_voice_turn(...)

State fieldVoice behavior
channelSet to voice.
transcriptAppends finalized user/assistant entries.
response_textFinal assistant transcript text.
response_styleTurn-policy style, grounded_lookup, or voice.
routeTurn-policy route, defaulting to therapeutic.
grounded_lookupMerged from completed grounded lookup tool output when present.
diagnostics.voice_runtimeopenai_realtime.
diagnostics.voice_tool_callsNames of observed voice tools for the turn.
session_progress.turn_countIncrements once per recorded voice exchange.

Key files

FilePurpose
agent/voice/runtime_facade.pyVoiceRuntimeFacade (runtime.voice.*) — voice_session_memory_context, build_voice_tool_context, and record_voice_turn.
agent/runtime/runtime.pyShared end_session(...) session-end path (voice and text).
agent/voice/transcript.pyConverts finalized voice text into runtime transcript entries.
api/routes/voice.pyPublic endpoint layer and incognito/persistent branch behavior.
apps/web/src/lib/realtime-voice-turn-record.tsBuilds the browser request body for turn recording.
apps/web/src/lib/session.tsStores voice transcripts, activities, session info, and finalization status in the web client.