Skip to main content

Roadmap

What's shipped, what's in progress, and what's planned.


Shipped

FeatureWhat landed
Web FrontendNext.js chat UI with streaming, persisted setup state, thread management, memory inspection, visible error fallbacks, and OpenAI Realtime voice entrypoint. Lives in apps/web/.
API LayerFastAPI with REST (POST /api/chat) and WebSocket (/api/chat/stream) endpoints. Thread management, memory status, session end. Lives in apps/backend/api/.
Voice Chat (OpenAI Realtime)Browser speech-to-speech over OpenAI Realtime WebRTC with app-owned tools, Realtime session policy, incognito/persistent modes, turn recording, and shared end-session finalization. Lives across apps/web/src/components/realtime-voice-session-provider.tsx and apps/backend/agent/voice/.
Session FeedbackEnd-of-session thumbs rating captured at /end, /exit, and POST /api/threads/{id}/end. Postgres-first durable backend with incognito-safe in-memory mode and legacy SQLite fallback.
Crisis Gate — LLM-onlyCrisis classification is a structured LLM call with strict truth-table enforcement. Provider failures surface through retries/errors instead of silently degrading to regex rules.
Routing — LLM-primaryCrisis, therapeutic dispatch, grounded lookup, memory-control, guided-exercise selection, and memory write policy use LLM-owned classifiers with local validation and hard confirmation gates where needed.
Knowledge Overhaulcore_identity.md defines assistant role, product stance, voice, therapeutic grounding, cultural sensitivity, repair patterns, and boundary-setting voice. boundaries.md expands redirection patterns and dependency framing.
OpenAI Embeddingstext-embedding-3-large as the configured provider, with token-only retrieval when no API key is available. Hybrid RRF retrieval achieves 14/17 recall@5 vs 6/17 token-only.

In progress

FeatureStatusWhat's left
Response quality reviewManual dogfood pathNeeds broader review of ordinary support turns and longer dogfood transcripts.
Memory integration regression coveragePytest-firstRuntime and recall tests cover semantic, episodic, procedural, correction, deletion, and cross-feature behavior. Remaining work is wider live dogfood coverage and voice parity.
Session feedback — closing modeClosing signal wired, feedback UX pendingClosing detection is LLM-primary and emits session_action=suggest_end_session; feedback prompt still needs to fire from natural closings, not just CLI/API end commands.
Session feedback — voiceDesigned, not wiredVoice disconnect now routes through end_session() in persistent mode; the remaining work is collecting an explicit feedback value from the voice end-state UI.

Planned

Messaging Channels

WhatsApp, Discord, and Telegram adapters. The agent graph is channel-agnostic; each adapter would map platform message formats to AgentInput / AgentOutput. Crisis responses would need channel-specific formatting (inline buttons, embeds).

Acoustic Crisis Detection

Voice mode currently uses transcript-only crisis detection. Real gaps: voice cracking, sobbing, pressured speech, prosodic flatness. A user saying "I'm fine" through tears scores level 0.

Requires either a curated distressed-voice dataset (ethically fraught) or a validated off-the-shelf acoustic classifier (not a solved problem). Calendar-gated on dataset and model maturity.

Graph Memory

Graphiti + Neo4j for entity/relationship extraction from semantic facts. Enables relational reasoning: "you mentioned your sister and your work stress — they tend to co-occur." The graphiti-core dependency is in pyproject.toml, but the integration is intentionally disabled pending design.

Background Consolidation

Automatic fact merging, dormant marking, and a consolidation_runs log. Schema is defined (ConsolidationProposal, ConsolidationRunRecord in agent/memory/models.py); the implementation is planned but not wired into the graph. Adds /memory restore as an undo for destructive operations.

Session Intent, Stage, and Response Guidance

Three state fields (progress.intent, progress.stage, response.guidance) are defined in the schema but not yet populated by the runtime. When implemented, they enable session-level steering: the agent knows whether to deepen, stabilize, or close based on conversation arc rather than just the current message.

Crisis Gate Production Telemetry

Model ID, prompt version, raw/normalized levels, confidence values, timeout/parse failure counters, and degraded-mode alerts. The production telemetry layer is not yet in place.

Clinical Review

A trained clinician reviews the agent/prompts/sources/response_styles/*.md files, the agent-owned prompt builders in agent/specialists/, and agent responses across dogfood sessions. This is the gate before "a trusted friend could try it" becomes a defensible claim. Calendar dependency, not engineering.