Hybrid Retrieval

Reciprocal Rank Fusion (RRF) combines two scorers so the agent handles both exact-name queries and paraphrased queries well.

✉User message

Token-recall

Tokenize query (stopword-filtered). Score each record by|query ∩ record| / |query|. Keep ≥ 0.33.

Proper nounsMedication namesShort queries

Embedding cosine

Compute query embedding. Cosine similarity against stored embeddings. Keep ≥ 0.5.

StemmingSynonymsParaphrase

RRF fusionscore = Σ 1/(k + rank), k=60

Top-k → WorkingMemoryEntry dicts

Why hybrid?

Neither scorer alone is robust enough for therapy content:

Scorer	Wins on	Loses on
Token-recall	Proper nouns ("Sarah"), medication names ("fluoxetine"), short queries, multilingual text (CJK, Cyrillic, accented Latin)	Stemming ("anxiety" vs "anxious"), synonyms ("sibling" vs "sister"), paraphrase
Embedding	Stemming, synonyms, semantic paraphrase ("I feel stuck" vs "things feel hopeless")	Proper nouns (name signal diluted), short queries (too little context)
Hybrid RRF	Both — fuses by rank position, not raw score	Nothing significant

Unicode-aware tokenization

The tokenizer uses \b\w+\b (Python 3 Unicode-aware) with a CJK character-splitting post-processor. Chinese, Japanese, Korean, Cyrillic, and accented Latin text all produce meaningful token sets for both dedup and retrieval. CJK characters are emitted individually (standard for CJK IR without a word segmenter), covering BMP and astral-plane Extensions B through H.

RRF's constant k=60 requires no per-dataset tuning (Cormack et al. 2009).

Fallback paths

Scenario	What happens
No embedding provider	Pure token-recall (the pre-embedding behavior)
Embedding API failure	Logged, falls back to token-recall for this turn
Record has no embedding	Participates in token-recall only
Model mismatch	Record skipped in embedding scan

The retrieval_path diagnostic reports which path ran: "hybrid_rrf", "token_recall", or "token_recall_after_embed_error".

Episodic date filter

Query-based episodic retrieval applies a max_age_days=30 filter — session arcs older than 30 days are excluded from the search results. This keeps the agent focused on recent context.

The first-turn catch-up path (alatest) is not date-filtered — the most recent session summary always appears regardless of age. This ensures every new session opens with continuity even if the user hasn't visited in months.

Embedding storage

Column	Type	Purpose
`embedding`	BLOB	float32 array via `struct.pack`
`embedding_dim`	INTEGER	Dimensionality validation
`embedding_model`	TEXT	Model migration detection

The provider factory (create_configured_embedding_provider in agent/memory/providers/embeddings.py) picks:

OpenAI text-embedding-3-large (3072 dims) when OPENAI_API_KEY is set.
NullEmbeddingProvider otherwise — retrieval degrades to token-recall only.

Mismatched-dimension records are skipped during dense rank, so swapping the embedding model never produces silently-wrong results; records re-embed lazily on the next write.

Turn Memory Structure

agent.runtime.memory_context owns turn-level retrieval orchestration: semantic and episodic search, procedural profile loading, first-turn episodic catch-up, and diagnostics. The retrieval algorithm itself stays in agent.memory.recall, so the memory package owns memory semantics while the runtime owns how memory becomes model-visible context for a runner turn.

Regression coverage

Retrieval changes should be covered with backend tests and targeted manual trace review.

Why hybrid?​

Fallback paths​

Episodic date filter​

Embedding storage​

Turn Memory Structure​

Regression coverage​