Skip to main content

Crisis Gate

First check in every turn. Every message passes through it before memory loads, before routing, before response generation. Wired as an SDK input guardrail on the Runner, so the runtime cannot reach response generation without it.

The classifier is LLM-only: a structured-output call decides the level for every message, then local normalization enforces the level-to-route truth table. Provider failures surface as errors with retry handling instead of silently degrading to regex.


Walkthrough

Pick a sample input. Watch the LLM verdict normalize into the state delta on the right, and where the turn ends up routing.

Sample input:
userPlan + means + timing. The LLM classifier returns level 3.

I have the pills set aside. I'm planning to take them tonight.

01
LLM classifiergenerate_structured(CrisisAssessmentSchema)
level=3

Structured output: level=3, confidence=high, reason="plan, means, and timing are present".

LLM verdict → normalize
02
Truth-table normalizationenforce_crisis_truth_table(assessment)
level=3 (final)

Truth table: level=3 → needs_crisis_response=true, needs_clarification=false.

state delta
crisis.level0 safe · 1 ambiguous · 2 clear self-harm · 3 imminent
routeneeds_crisis_response decides this
response_stylesafety_check stamped on entry to the crisis branch
crisis_audit.crisis_classifier_pathwritten for every completed crisis-gate turn
crisis_audit.crisis_override_kindalways none in the LLM-only gate
crisis_audit.crisis_llm_failure_occurredfailed LLM calls retry or surface instead of writing fallback state
next pipeline:
idle

Decision steps

StepWhen it runsWhat it does
1. LLM classifierEvery messageStructured output: level, confidence, reason. Sees recent history, handles negation, sarcasm, quoted speech, idioms, and safety-denial context.
2. Truth-table normalizationAlways after a classifier resultForces needs_crisis_response = level ≥ 2 and needs_clarification = level == 1, regardless of what the classifier returned. Prevents miscalibrated flags from wrong-routing.

Routing

Final levelRoutePipeline
0 / 1therapeutictriage → load_memory → selected specialist (usually TherapeuticAgent)
2 / 3crisisrun_crisis_turn: resource lookup → CrisisAgent reply → audit log write

Level 1 stays therapeutic. needs_clarification=true is set in state but doesn't trigger the crisis branch. The triage agent returns the route, and the TherapeuticAgent layers the safety_check.md overlay on top of the active style (usually clarifying) to force one safety probe before ordinary support.

The crisis branch leads with crisis-resource lookup so any verified hotlines for the user's region land in the same crisis reply. Crisis logging is always-on regardless of memory mode — even in incognito the event is recorded with user_id set to NULL and session_id stored as a one-way hash.


Privacy asymmetry

Field in incognitoBehavior
user_id_or_nullNone — no identity persisted
session_id_opaqueSHA-256 hash, no reverse mapping
Event recorded?Yes — safety audit trail preserved

Retention: 90 days. /memory purge-crisis [days] enforces the window (exclusive boundary — the cutoff date itself is preserved).


Diagnostics

KeyValue
crisis_gate_msWall-clock time for the full assessment
crisis_classifier_pathllm_primary
crisis_levelNormalized level (0–3)
crisis_audit.crisis_override_kindAlways none
crisis_audit.crisis_llm_failure_occurredAlways false for completed turns; failed classifier calls surface through retry/error handling instead of producing a degraded verdict

Design rules

RuleWhy
Response pipeline waits for the gateSafety sequencing > latency
LLM is the classifier, not regexHandles negation, context, sarcasm, quoted speech, idioms, and subtle escalation
No silent no-LLM fallbackA failed safety classifier should be visible and retried, not converted into brittle local pattern matching
Normalization enforces the truth tablePrevents miscalibrated LLMs from wrong-flagging
Audit log is always-onPrivacy asymmetry — incognito scrubs identity but still records
Boundary-case testsCover imminent risk, clear self-harm, idiomatic-safe, ambiguous, and safety-denial cases

Key files

FilePurpose
agent/guardrails/assessment.pyCrisis gate orchestration and truth-table normalization
agent/guardrails/service.pyLLM-only crisis classifier service and structured output schema
agent/guardrails/prompts.pyCrisis classifier prompt and system instruction
agent/guardrails/crisis.pyOpenAI Agents SDK input guardrail wrapper
agent/tools/crisis.pyCrisis resource lookup tool and crisis response delta
agent/audit/crisis_log.pyAudit-log write helper, backend protocol, and in-memory/null implementations
agent/audit/postgres_crisis_log.pyPrimary durable backend with retention purge
agent/audit/sqlite_crisis_log.pySQLite fallback backend with retention purge
agent/audit/models.pyCrisisLogRecord, CrisisOverrideOutcome, CrisisClassifierPath
agent/tools/grounded_search.pyfind_crisis_resources for crisis-resource surfacing