Skip to content

Search Documentation

Search across all documentation pages

Embedding Model

ExoVault uses vector embeddings for semantic search across memories, notes, and media. Embeddings enable finding relevant content based on meaning, not just keyword matches.

Default Model#

PropertyValue
Modelgemini-embedding-2-preview
ProviderGoogle (Gemini)
Dimensions3,072
ModalitiesText, images, audio, video, PDF

The model is configured via environment variables:

VariableDefault
GEMINI_API_KEYRequired — free tier, no billing needed

Migration note (Mar 2026): ExoVault migrated from OpenAI text-embedding-3-small (1,536d) to Gemini gemini-embedding-2-preview (3,072d). Migration 0045 resized all vector columns. HNSW indexes were removed (pgvector 2,000d cap) — exact kNN search is used instead.

How Embeddings Are Used#

Memory Search (search_memories)#

The primary search pipeline uses a 4-signal hybrid approach:

  1. Vector similarity — Query is embedded and compared against memory embeddings using cosine distance
  2. BM25 keyword scoring — Deterministic full-text scoring inspired by QMD. Catches exact terms that vectors miss
  3. Blind index — HMAC-based encrypted token matching for keyword-level recall on E2E encrypted data
  4. Knowledge graph expansion — Graph neighbors of top results are included

These signals are fused using Reciprocal Rank Fusion (RRF), then refined with temporal decay and MMR diversity re-ranking.

Note Search (semantic_search)#

Notes use pure vector similarity search:

  1. Query is embedded using the same model
  2. Note embeddings are compared using cosine distance
  3. Results are deduplicated across chunks (keeping the highest similarity per note)

Media files (video, audio, images, PDFs) are embedded using Gemini's native multimodal embedding:

  1. File is uploaded and encrypted
  2. Gemini extracts text content (audio transcription, visual description)
  3. Both the extracted text and the raw media are embedded in the same 3,072d vector space
  4. Agents can search for spoken words in videos or content in images using the same search_memories tool

Session Start (query parameter)#

When session_start includes a query parameter, semantic search is used to find additional relevant memories beyond the standard context profile retrieval.

Graph Exploration (explore_graph)#

The explore_graph tool can use semantic search to find entry points for graph traversal. Both memory and note embeddings are searched.

Embedding Generation#

When Embeddings Are Created#

Embeddings are generated asynchronously after content is written:

  1. Memory or note is written to the database with indexingStatus: "pending"
  2. A background job (via Inngest) picks up pending items
  3. Content is sent to the Gemini embedding API
  4. The resulting vector is stored in memory_embeddings or note_embeddings
  5. indexingStatus is updated to "indexed"

Chunking#

Long content is split into chunks before embedding:

  • Each chunk is embedded independently
  • Chunk metadata (index, source ID) is stored alongside the embedding
  • During search, results are deduplicated to the source level (keeping the highest similarity per memory/note)

Token Estimation#

Embedding token usage is estimated as:

tokens ≈ ceil(content_length / 4)

This is used for quota tracking and usage billing.

Similarity Scoring#

Cosine Similarity#

ExoVault uses cosine similarity (1 - cosine distance) for all vector comparisons:

sql
1 - (embedding <=> query_embedding) AS similarity

Threshold Defaults#

ContextDefault ThresholdMeaning
search_memories0.4Broad search, catches partial matches
semantic_search (notes)0.5Moderate threshold for notes
session_start (query)0.4Same as memory search
explore_graph0.4Same as memory search

Thresholds can be overridden per-request.

Similarity Interpretation#

ScoreInterpretation
0.8 - 1.0Very high similarity (near-duplicate content)
0.6 - 0.8Strong similarity (same topic, different wording)
0.4 - 0.6Moderate similarity (related content)
0.2 - 0.4Weak similarity (loosely related)
0.0 - 0.2Minimal similarity (likely unrelated)

Search Ranking Pipeline#

The full ranking pipeline for search_memories:

Query → Embed → Vector Search (top K*4)
                BM25 Keyword Search
                Blind Index Search (top K*2)
                        ↓
              RRF Fusion (weighted)
                        ↓
              Fetch candidate metadata
                        ↓
              Temporal Decay (half-life)
                        ↓
              MMR Diversity Re-ranking
                        ↓
              Final top K results

RRF Weights#

SignalWeightPurpose
Semantic (vector)1.0Primary relevance signal
BM25 (keyword)0.9Exact term matching
Blind index0.8Encrypted keyword-level recall
Knowledge graph0.6Contextual expansion

Temporal Decay#

Recent memories are boosted relative to older ones. The decayHalfLife parameter (default: 30 days) controls the decay rate. A memory loses half its recency boost every decayHalfLife days.

MMR Diversity#

Maximal Marginal Relevance (MMR) re-ranking reduces redundancy in results. The diversity parameter (default: 0.7) controls the balance:

  • 0.0 — Pure relevance (no diversity penalty)
  • 0.5 — Balanced
  • 1.0 — Maximum diversity (heavily penalizes similar results)

Storage#

Embeddings are stored in PostgreSQL using pgvector:

sql
-- Memory embeddings
CREATE TABLE memory_embeddings (
  id UUID PRIMARY KEY,
  memory_id UUID REFERENCES memories(id),
  user_id UUID NOT NULL,
  embedding vector(3072),
  chunk_index INTEGER DEFAULT 0,
  ...
);

-- Note embeddings
CREATE TABLE note_embeddings (
  id UUID PRIMARY KEY,
  note_id UUID REFERENCES notes(id),
  user_id UUID NOT NULL,
  embedding vector(3072),
  chunk_index INTEGER DEFAULT 0,
  ...
);

Note: HNSW indexes are not used (pgvector caps at 2,000 dimensions). ExoVault uses exact kNN search, which is accurate at all dimensions. For most vault sizes (<100K memories), exact search performs well.