Skip to main content

Hybrid Retrieval

Reciprocal Rank Fusion (RRF) combines two scorers so the agent handles both exact-name queries and paraphrased queries well.

User message
Token-recall
Tokenize query (stopword-filtered). Score each record by|query record| / |query|. Keep 0.33.
Proper nounsMedication namesShort queries
Embedding cosine
Compute query embedding. Cosine similarity against stored embeddings. Keep 0.5.
StemmingSynonymsParaphrase
RRF fusionscore = Σ 1/(k + rank), k=60
Top-k WorkingMemoryEntry dicts

Why hybrid?

Neither scorer alone is robust enough for therapy content:

ScorerWins onLoses on
Token-recallProper nouns ("Sarah"), medication names ("fluoxetine"), short queries, multilingual text (CJK, Cyrillic, accented Latin)Stemming ("anxiety" vs "anxious"), synonyms ("sibling" vs "sister"), paraphrase
EmbeddingStemming, synonyms, semantic paraphrase ("I feel stuck" vs "things feel hopeless")Proper nouns (name signal diluted), short queries (too little context)
Hybrid RRFBoth — fuses by rank position, not raw scoreNothing significant
Unicode-aware tokenization

The tokenizer uses \b\w+\b (Python 3 Unicode-aware) with a CJK character-splitting post-processor. Chinese, Japanese, Korean, Cyrillic, and accented Latin text all produce meaningful token sets for both dedup and retrieval. CJK characters are emitted individually (standard for CJK IR without a word segmenter), covering BMP and astral-plane Extensions B through H.

RRF's constant k=60 requires no per-dataset tuning (Cormack et al. 2009).


Fallback paths

ScenarioWhat happens
No embedding providerPure token-recall (the pre-embedding behavior)
Embedding API failureLogged, falls back to token-recall for this turn
Record has no embeddingParticipates in token-recall only
Model mismatchRecord skipped in embedding scan

The retrieval_path diagnostic reports which path ran: "hybrid_rrf", "token_recall", or "token_recall_after_embed_error".


Episodic date filter

Query-based episodic retrieval applies a max_age_days=30 filter — session arcs older than 30 days are excluded from the search results. This keeps the agent focused on recent context.

The first-turn catch-up path (alatest) is not date-filtered — the most recent session summary always appears regardless of age. This ensures every new session opens with continuity even if the user hasn't visited in months.


Embedding storage

ColumnTypePurpose
embeddingBLOBfloat32 array via struct.pack
embedding_dimINTEGERDimensionality validation
embedding_modelTEXTModel migration detection

The provider factory (create_configured_embedding_provider in agent/memory/providers/embeddings.py) picks:

  1. OpenAI text-embedding-3-large (3072 dims) when OPENAI_API_KEY is set.
  2. NullEmbeddingProvider otherwise — retrieval degrades to token-recall only.

Mismatched-dimension records are skipped during dense rank, so swapping the embedding model never produces silently-wrong results; records re-embed lazily on the next write.


Turn Memory Structure

agent.runtime.memory_context owns turn-level retrieval orchestration: semantic and episodic search, procedural profile loading, first-turn episodic catch-up, and diagnostics. The retrieval algorithm itself stays in agent.memory.recall, so the memory package owns memory semantics while the runtime owns how memory becomes model-visible context for a runner turn.


Regression coverage

Retrieval changes should be covered with backend tests and targeted manual trace review.