Crisis Gate
First check in every turn. Every message passes through it before memory loads, before routing, before response generation. Wired as an SDK input guardrail on the Runner, so the runtime cannot reach response generation without it.
The classifier is LLM-only: a structured-output call decides the level for every message, then local normalization enforces the level-to-route truth table. Provider failures surface as errors with retry handling instead of silently degrading to regex.
Walkthrough
Pick a sample input. Watch the LLM verdict normalize into the state delta on the right, and where the turn ends up routing.
Structured output: level=3, confidence=high, reason="plan, means, and timing are present".
Truth table: level=3 → needs_crisis_response=true, needs_clarification=false.
crisis.level…0 safe · 1 ambiguous · 2 clear self-harm · 3 imminentroute…needs_crisis_response decides thisresponse_style…safety_check stamped on entry to the crisis branchcrisis_audit.crisis_classifier_path…written for every completed crisis-gate turncrisis_audit.crisis_override_kind…always none in the LLM-only gatecrisis_audit.crisis_llm_failure_occurred…failed LLM calls retry or surface instead of writing fallback stateDecision steps
| Step | When it runs | What it does |
|---|---|---|
| 1. LLM classifier | Every message | Structured output: level, confidence, reason. Sees recent history, handles negation, sarcasm, quoted speech, idioms, and safety-denial context. |
| 2. Truth-table normalization | Always after a classifier result | Forces needs_crisis_response = level ≥ 2 and needs_clarification = level == 1, regardless of what the classifier returned. Prevents miscalibrated flags from wrong-routing. |
Routing
| Final level | Route | Pipeline |
|---|---|---|
| 0 / 1 | therapeutic | triage → load_memory → selected specialist (usually TherapeuticAgent) |
| 2 / 3 | crisis | run_crisis_turn: resource lookup → CrisisAgent reply → audit log write |
Level 1 stays therapeutic. needs_clarification=true is set in
state but doesn't trigger the crisis branch. The triage agent
returns the route, and the TherapeuticAgent layers the
safety_check.md overlay on top of the active style (usually
clarifying) to force one safety probe before ordinary support.
The crisis branch leads with crisis-resource lookup so any verified
hotlines for the user's region land in the same crisis reply.
Crisis logging is always-on regardless of memory mode — even in
incognito the event is recorded with user_id set to NULL and
session_id stored as a one-way hash.
Privacy asymmetry
| Field in incognito | Behavior |
|---|---|
user_id_or_null | None — no identity persisted |
session_id_opaque | SHA-256 hash, no reverse mapping |
| Event recorded? | Yes — safety audit trail preserved |
Retention: 90 days. /memory purge-crisis [days] enforces the
window (exclusive boundary — the cutoff date itself is preserved).
Diagnostics
| Key | Value |
|---|---|
crisis_gate_ms | Wall-clock time for the full assessment |
crisis_classifier_path | llm_primary |
crisis_level | Normalized level (0–3) |
crisis_audit.crisis_override_kind | Always none |
crisis_audit.crisis_llm_failure_occurred | Always false for completed turns; failed classifier calls surface through retry/error handling instead of producing a degraded verdict |
Design rules
| Rule | Why |
|---|---|
| Response pipeline waits for the gate | Safety sequencing > latency |
| LLM is the classifier, not regex | Handles negation, context, sarcasm, quoted speech, idioms, and subtle escalation |
| No silent no-LLM fallback | A failed safety classifier should be visible and retried, not converted into brittle local pattern matching |
| Normalization enforces the truth table | Prevents miscalibrated LLMs from wrong-flagging |
| Audit log is always-on | Privacy asymmetry — incognito scrubs identity but still records |
| Boundary-case tests | Cover imminent risk, clear self-harm, idiomatic-safe, ambiguous, and safety-denial cases |
Key files
| File | Purpose |
|---|---|
agent/guardrails/assessment.py | Crisis gate orchestration and truth-table normalization |
agent/guardrails/service.py | LLM-only crisis classifier service and structured output schema |
agent/guardrails/prompts.py | Crisis classifier prompt and system instruction |
agent/guardrails/crisis.py | OpenAI Agents SDK input guardrail wrapper |
agent/tools/crisis.py | Crisis resource lookup tool and crisis response delta |
agent/audit/crisis_log.py | Audit-log write helper, backend protocol, and in-memory/null implementations |
agent/audit/postgres_crisis_log.py | Primary durable backend with retention purge |
agent/audit/sqlite_crisis_log.py | SQLite fallback backend with retention purge |
agent/audit/models.py | CrisisLogRecord, CrisisOverrideOutcome, CrisisClassifierPath |
agent/tools/grounded_search.py | find_crisis_resources for crisis-resource surfacing |