Tools & Policy
Voice exposes a narrow OpenAI Realtime function-tool surface. The
schemas live in agent/voice/tools.py, but the implementation reuses
the same app-owned services as the text runtime: memory control,
grounded lookup, crisis resources, therapeutic response skills, and
guided exercises.
Tool surface
| Tool | Mode | Purpose |
|---|---|---|
wait_for_user | Incognito + persistent | Stay quiet for silence, background audio, side conversations, or speech not addressed to OpenCouch. |
show_memory_status | Incognito + persistent | Explain whether memory is enabled and report saved-memory counts. |
load_therapeutic_response_skill | Incognito + persistent | Load prompt-local therapeutic response-style guidance for ordinary replies. |
answer_grounded_lookup | Incognito + persistent | Answer explicit current, official, source-backed, or externally verifiable requests from grounded results. |
lookup_crisis_resources | Incognito + persistent | Find verified crisis resources for the current crisis turn. |
get_crisis_support_template | Incognito + persistent | Load a deterministic crisis-response safety scaffold to shape the crisis reply. |
list_guided_exercise_skills | Incognito + persistent | List metadata-only guided exercises available for the current channel and approach. |
load_guided_exercise_skill | Incognito + persistent | Load the runtime-selected guided-exercise skill block for a step/action. |
record_guided_exercise_progress | Incognito + persistent | Update active guided-exercise state from the user's latest response. |
show_saved_memory | Persistent only | Show concise saved facts, session summaries, and procedural rules. |
recall_saved_memory | Persistent only | Query the user's saved memory for facts and session arcs relevant to the current turn. |
save_response_preference | Persistent only | Save an explicit response-style or memory-use preference. |
set_proactive_memory_recall | Persistent only | Enable or disable proactive memory recall. |
prepare_memory_deletion_by_index | Persistent only | Stage deletion of a visible saved-memory item by kind and one-based index. |
prepare_memory_deletion_by_query | Persistent only | Stage deletion of a saved-memory item selected by a query. |
confirm_memory_deletion | Persistent only | Confirm and perform a pending deletion. |
cancel_memory_deletion | Persistent only | Cancel a pending deletion. |
The tool surface is filtered by memory mode: persistent sessions expose all
17 tools, while incognito sessions expose only the 9 mode-independent
tools above. Persistent-only tools are rejected in incognito voice mode. The
one exception is show_memory_status, which returns an explicit incognito
status so the assistant can explain that durable memory is off.
Intent gate on memory mutations
Beyond mode filtering, the four destructive or preference-changing memory
tools — set_proactive_memory_recall, save_response_preference,
prepare_memory_deletion_by_index, and prepare_memory_deletion_by_query —
carry an additional intent gate. Each must include a user_quote
argument, and the backend only lets the mutation proceed if that quote matches
something the user actually said in a recent turn (within the last few turns,
above a minimum length). If the model cannot cite a matching user utterance,
the call is blocked.
This is an anti-misfire guard for a hands-free surface: in a free-flowing speech loop there is no explicit confirm button, so the gate prevents the model from saving a preference or staging a deletion the user never requested.
Realtime-owned turns
Realtime server VAD owns turn detection and response creation for live voice sessions. The session configuration enables automatic response creation after finalized user speech, so the browser does not call a backend policy endpoint before each response.
The backend still owns the app-specific boundaries:
- Realtime function tools call
POST /api/voice/realtime/tools. - Tools that require app data, safety resources, memory state, grounded lookup, or guided-exercise state execute through shared backend services.
wait_for_useris a no-op signal for silence, background audio, side conversations, or speech not addressed to OpenCouch. Its function output is returned to Realtime without asking the model to continue.- When the browser records a completed exchange through
POST /api/voice/realtime/turn, the backend infers route and response-style metadata from the actual tool calls that happened in the Realtime session.
Crisis and factual lookups
Two cases should be treated as hard boundaries:
- Crisis turns that need specific resources must use
lookup_crisis_resources; the assistant must not invent hotline names, phone numbers, URLs, or local availability. - Factual/current/source-backed requests must use
answer_grounded_lookup; the assistant should answer only from the tool result.
Provider failures surface as tool failures rather than silent fallback answers. Verified-but-empty lookup cases return explicit status values so the user gets "I could not verify that" instead of invented content.
Guided exercises
Voice uses the shared guided-exercise catalog but filters to exercises that are suitable for spoken delivery. There are 13 total exercises and 10 voice-supported exercises today:
grounding_5_4_3_2_1grounding_box_breathinggrounding_stop_techniquegrounding_muscle_relaxationthought_work_continuumbehavioral_activation_tiny_actiondefusion_values_compassself_compassion_breakemotion_regulation_improveemotion_regulation_gratitude
The voice prompt explicitly tells the model not to default to 5-4-3-2-1 grounding unless that exact runtime-selected skill has been provided.
Key files
| File | Purpose |
|---|---|
agent/voice/tools.py | Realtime function-tool schemas and dispatcher. |
agent/voice/turn_metadata.py | Route and response-style metadata inference from recorded Realtime tool calls. |
agent/voice/policy.py | Compact Realtime instruction policy for one voice session. |
agent/tools/ | Shared tool implementations used by both text and voice. |
agent/skills/guided_exercises/registry.py | Shared exercise catalog and voice filtering. |