Skip to main content

Memory Layers

Three CoALA-inspired memory layers give the agent persistent context across sessions. Click a layer to see how it writes, reads, and stores.

Writeafter response + shared session-end seam
Readbefore response (per turn)
[relationship] User WORRIES_ABOUT work — "my boss is terrible"[coping_strategy] User USES fluoxetine — "I take fluoxetine daily"
WriteLLM candidate extraction via extract_semantic_facts_node + deterministic write_policyAfter every response. Only lower-risk stable facts commit immediately; sensitive or interpretive candidates are buffered for session-end review or require repetition before promotion.
ReadHybrid RRF — embedding cosine + token-recall fused per turn SemanticWorkingMemoryEntry in working_memory
Output shape{ type: "semantic", evidence_quote: "...", category: "...", subject: "...", predicate: "...", object: "..." }
StorageOne row per active fact, namespaced (owner_id, "semantic"). Dedup bumps matches; superseded facts go dormant. Held candidates live in the persisted active-session buffer until session end. Embedding stored as BLOB. Unicode-aware tokenizer handles CJK, Cyrillic, and accented Latin.
Response Generation
Post-response extraction is now policy-based

Both extractors (semantic + procedural) still fan out simultaneously from finalize_turn_node, but they no longer write everything they extract directly into long-term memory. They now:

  1. extract candidates with structured LLM output
  2. run deterministic write policy
  3. either commit immediately, hold for session end, require repetition, or drop

Diagnostics still merge via _merge_dicts, so the two extractor lanes never race.

Working memory carries structured context

load_memory_node returns WorkingMemoryEntry dicts with full semantic triples (category, subject, predicate, object) alongside the evidence quote. The response LLM sees entries like [relationship] User WORRIES_ABOUT work — 'my boss is terrible' instead of raw quotes. Formatting happens on demand at prompt-build time via format_working_memory_entries().


Current write model

LayerTurn-time behaviorSession-end behavior
SemanticExtracts candidates after the reply. Low-risk stable facts may commit immediately. Sensitive or interpretive content is held or repetition-gated.Held candidates can promote after transcript support, episodic-summary support, or repetition.
EpisodicNo per-turn writes.One session summary arc is written at session end if the session is substantive enough.
ProceduralExplicit durable instructions can commit immediately. Implicit agent-facing preferences are usually held.Held implicit preferences can promote if they repeat strongly enough during the session.

Session end is now a shared seam, not just /end. The same commit path can run on explicit end, inactivity timeout, graceful shutdown, web end session, and voice disconnect.

Held semantic and procedural candidates live in a persisted active-session buffer until that seam runs, so delayed promotion survives restart rather than disappearing with the process.


Memory modes

ModeWrites to diskEmbeddingsCrisis logFeedback
IncognitoNoNoIn-memory (ephemeral)In-memory (ephemeral)
PersistentYes (SQLite)YesSQLite (90-day)SQLite (180-day)

The --user-id flag decouples memory identity from thread identity — switching threads preserves memory across sessions.


Robustness guardrails

GuardrailWhat it prevents
Unicode-aware tokenizerNon-English text (CJK, Cyrillic, accented Latin) now produces meaningful token sets for dedup and retrieval instead of empty sets. CJK characters are split into per-character tokens for search.
Procedural rule capActive rules are capped at 20. When exceeded, the oldest rule is archived — not deleted — preventing unbounded system prompt inflation.
Atomic batch writesThe aput_batch store method wraps multi-record writes in a single SQLite transaction. A crash mid-batch cannot leave ghost active records.
Episodic date filterQuery-based episodic retrieval excludes arcs older than 30 days. The first-turn catch-up (alatest) is not date-filtered — the most recent session summary always appears regardless of age.
Owner identity validationAll memory nodes require either user_id or session_id in state. Missing both raises ValueError immediately instead of silently writing to a shared "local-default" namespace.
Safety marker consolidationCrisis-bypass detection markers (e.g., "skip the safety check") are defined once in constants.py and imported by both the candidate promoter and write policy — no drift risk between the two checks.

Key files

FilePurpose
agent/memory/store.pyMemoryStore protocol — asearch_similar, aput, aput_batch, alatest
agent/memory/sqlite_store.pySQLite backend with embedding BLOB storage and transactional batch writes
agent/memory/retrieval.pyRRF fusion helper
agent/memory/constants.pyShared safety markers and helpers (single source of truth)
agent/memory/candidates.pyCandidate models + session buffer
agent/memory/text_tokens.pyUnicode-aware tokenizer with CJK character splitting
agent/memory/dedup.pyToken-set Jaccard dedup (0.85 threshold)
agent/memory/write_policy.pyDeterministic immediate-write / hold / drop policy
agent/memory/reconciliation.pySupersession and overlap handling
agent/memory/procedural.pyProceduralProfile load/save helpers with rule cap eviction
agent/working_memory.pyWorkingMemoryEntry types with full SPO triples + formatters
agent/state.pyresolve_owner_id — fail-loud identity validation
agent/nodes/load_memory.pyPer-turn retrieval with 30-day episodic date filter
agent/nodes/extract_facts.pySemantic candidate extraction + immediate-write path
agent/nodes/extract_procedural_rules.pyProcedural candidate extraction + immediate-write path
agent/nodes/commit_session_memory.pySession-end promotion for held semantic / procedural candidates
agent/nodes/summarize_session.pySession-end episodic summarization
agent/persistence.pyShared session-end seam, timeout handling, active-session buffering