Skip to main content

Memory Layers

Three CoALA-inspired memory layers give the agent persistent context across sessions. Click a layer to see how it writes, reads, and stores.

Writeafter response + shared session-end path
Readbefore response (per turn)
[relationship] User WORRIES_ABOUT work — "my boss is terrible"[coping_strategy] User USES fluoxetine — "I take fluoxetine daily"
WriteRuntime LLM candidate extraction + LLM-primary write policyAfter the graph response. The policy decides commit-now, session-end hold, repetition-gate, or drop; hard local guards enforce storage/safety invariants.
ReadHybrid RRF — embedding cosine + token-recall fused per turn SemanticWorkingMemoryEntry in working_memory
Output shape{ type: "semantic", evidence_quote: "...", category: "...", subject: "...", predicate: "...", object: "..." }
StorageOne row per active fact, namespaced (owner_id, "semantic"). Exact duplicates bump locally; LLM-primary reconciliation handles supersede/coexist decisions. Held candidates live in the persisted active-session buffer until session end. Unicode-aware tokenizer handles CJK, Cyrillic, and accented Latin.
Response Generation
Post-response extraction is runtime-owned

After the response is finalized, the runtime schedules semantic and procedural extraction work. The extractors don't write everything they extract directly into long-term memory. They:

  1. extract candidates with structured LLM output
  2. run LLM-primary write policy with hard local safety/storage guards
  3. either commit immediately, hold for session end, require repetition, or drop

Diagnostics still merge via _merge_dicts, so the two extractor lanes can report into the same turn without racing.

Working memory carries structured context

The load-memory runtime stage returns WorkingMemoryEntry dicts with full semantic triples (category, subject, predicate, object) alongside the evidence quote. The response LLM sees entries like [relationship] User WORRIES_ABOUT work — 'my boss is terrible' instead of raw quotes. Formatting happens on demand at prompt-build time via format_working_memory_entries().

Audit, feedback, and memory are separate packages

Crisis logs live in agent/audit/, and session feedback lives in agent/feedback/ — not under agent/memory/. The split is enforced because those records are operationally always-on, never feed the prompt, and never participate in the user-facing memory-recall toggle.

Storage names vs. product names

The API and stores use the CoALA-derived layer names — semantic, episodic, procedural (this is what /api/memory/status returns under counts). The web Memory page relabels them to the friendlier facts, sessions, and rules for users. They are the same three layers; only the labels differ by surface.


Current write model

LayerTurn-time behaviorSession-end behavior
SemanticExtracts candidates after the reply. The LLM-primary write policy decides commit-now vs session-end hold vs repetition vs drop; hard local guards only enforce storage/safety invariants.Held candidates can promote after transcript support, episodic-summary support, or repetition.
EpisodicNo per-turn writes.One session summary arc is written at session end if the session is substantive enough.
ProceduralExplicit durable instructions can commit immediately when the LLM-primary policy classifies them as durable. Implicit agent-facing preferences are usually held; safety-conflict requests ("skip the safety check") are dropped.Held implicit preferences can promote if they repeat strongly enough during the session.

Session end is a shared path, not just /end. Text-mode explicit end, inactivity timeout (20 min), graceful shutdown, web end-session, and voice transcript finalization converge on the same memory finalization services through their respective runtime entrypoints.

Held semantic and procedural candidates live in a SessionMemoryBuffer that is persisted through the active-session backend, so delayed promotion survives restart rather than disappearing with the process.


Memory modes

Three persistence tiers are defined in agent/memory/modes.py:

ModeWrites to diskEmbeddingsCrisis logFeedback
IncognitoNo (in-memory only)NullEmbeddingProviderIn-memory (ephemeral)In-memory (ephemeral)
LocalYes (configured backend; Postgres recommended)Configured providerConfigured backendConfigured backend
SyncedYes (treated like Local today)Configured providerConfigured backendConfigured backend

SYNCED is reserved for a future remote persistence tier; runtime code currently treats it like LOCAL while the backend sync layer remains unimplemented. The CLI's --memory-mode guest is a friendly alias for INCOGNITO.

The --user-id flag decouples memory identity from thread identity — switching threads preserves memory across sessions. Without --user-id, the thread id is used as the memory owner.


Robustness guardrails

GuardrailWhat it prevents
Unicode-aware tokenizerNon-English text (CJK, Cyrillic, accented Latin) produces meaningful token sets for dedup and retrieval instead of empty sets. CJK characters are split into per-character tokens for search.
Procedural rule cap with archivalActive rules are capped; older or superseded rules are archived (not deleted) to prevent unbounded system prompt inflation.
Atomic batch writesThe aput_batch store method wraps multi-record writes in one backend transaction. A crash mid-batch cannot leave ghost active records.
Episodic date filterQuery-based episodic retrieval excludes arcs older than 30 days. The first-turn catch-up (alatest) is not date-filtered — the most recent session summary always appears regardless of age.
Owner identity validationresolve_owner_id requires either user_id or session_id in state. Missing both raises ValueError immediately instead of silently writing to a shared "local-default" namespace.
LLM-primary policy with hard guardsProduct judgment lives in structured LLM policy calls; local code only enforces non-negotiable storage and safety constraints.
Reconciliation as safety valveNew facts that collide with active records can bump, supersede, or coexist through LLM-primary reconciliation. Exact duplicates remain local storage mechanics.

Key files

Memory package (agent/memory/)

FilePurpose
agent/memory/store/__init__.pyMemoryStore protocol — aput, aput_batch, asearch_similar, alatest, plus the in-memory OpenCouchMemoryStore
agent/memory/store/postgres.pyPrimary durable Postgres backend with hybrid retrieval support
agent/memory/store/sqlite.pySQLite fallback backend with embedding BLOB storage and transactional batch writes
agent/memory/modes.pyMemoryMode enum (INCOGNITO, LOCAL, SYNCED)
agent/memory/retrieval/ranking.py (RRF fusion k=60, lexical/dense rank, cosine similarity) and service.py (load_memory_for_turn orchestration)
agent/memory/providers/embeddings.pyEmbeddingProvider protocol + OpenAI / Null providers
agent/memory/text_tokens.pyUnicode-aware tokenizer with CJK character splitting
agent/memory/policy/candidates.pyCandidate models + SessionMemoryBuffer
agent/memory/operations/dedup.pyToken-set Jaccard dedup (0.85 threshold)
agent/memory/policy/write.pyLLM-primary immediate-write / hold / drop / require-repetition policy with hard local guards
agent/memory/policy/semantic.pyStable vs session-only semantic category policy constants
agent/memory/operations/reconciliation.pyLLM-primary bump / supersede / coexist handling for semantic + procedural writes
agent/memory/operations/procedural_profile.pyProceduralProfile load/save with per-user async lock and rule cap
agent/memory/prompts/summarization.pySession summarization prompts
agent/memory/hashing.pyhash_session_id(), iso_now()
agent/memory/types/Pydantic models grouped by concern (semantic, episodic, procedural, therapeutic, primitives)

Audit package (agent/audit/)

FilePurpose
agent/audit/models.pyCrisisLogRecord and classifier-path enums
agent/audit/crisis_log.pyCrisisLogBackend protocol + in-memory implementation
agent/audit/postgres_crisis_log.pyPrimary durable Postgres crisis log with retention purge
agent/audit/sqlite_crisis_log.pySQLite crisis log fallback backend

Feedback package (agent/feedback/)

FilePurpose
agent/feedback/models.pyFeedbackLabel, FeedbackSource, and SessionFeedbackRecord
agent/feedback/session_feedback.pySessionFeedbackBackend protocol + in-memory implementation
agent/feedback/postgres_session_feedback.pyPrimary durable Postgres session feedback store
agent/feedback/sqlite_session_feedback.pySQLite session feedback fallback backend

Runtime integration

FilePurpose
agent/memory/entries.pyWorkingMemoryEntry types with full SPO triples + formatters
agent/state.pyresolve_owner_id — fail-loud identity validation
agent/runtime/memory_context.pyRuntime-owned per-turn retrieval orchestration; core retrieval logic lives in agent/memory/retrieval/service.py (load_memory_for_turn)
agent/runtime/session/commit.pySession-end promotion orchestration for held semantic / procedural candidates; promotion logic lives in agent/memory/commit/service.py
agent/runtime/session/summarize.pySession-end episodic summarization orchestration; summarization/persistence logic lives in agent/memory/operations/episodic.py
agent/runtime/runtime.pyShared session-end path, timeout handling, active-session buffering