Skip to content

Search Documentation

Search across all documentation pages

Encryption

ExoVault uses a zero-knowledge encryption architecture where all user content is encrypted before being stored in the database. The server can encrypt and decrypt on behalf of agents but never stores plaintext content at rest.

Algorithm#

AES-256-GCM (Advanced Encryption Standard with 256-bit keys in Galois/Counter Mode)

  • Key size: 256 bits (32 bytes)
  • IV size: 96 bits (12 bytes), randomly generated per encryption operation
  • Authentication tag: 128 bits (16 bytes), included in ciphertext
  • Mode: GCM provides both confidentiality and authentication (AEAD)

Key Hierarchy#

ExoVault uses a three-level key hierarchy:

Server Encryption Key (SEK)
    │
    ├── User Master Encryption Key (MEK)
    │       │
    │       ├── Per-operation IV + AES-256-GCM
    │       │       → Encrypted memory content
    │       │       → Encrypted note content
    │       │       → Encrypted vault names
    │       │       → Encrypted message content
    │       │       → Encrypted link labels
    │       │       → Encrypted summaries
    │       │       ...

Server Encryption Key (SEK)#

  • Stored as the ENCRYPTION_KEY environment variable
  • 32-byte hex string (64 hex characters)
  • Used to wrap/unwrap per-integration MEKs
  • If lost, all encrypted data is unrecoverable
  • Must be backed up securely and never exposed

Master Encryption Key (MEK)#

  • Generated per agent integration
  • Wrapped (encrypted) with the SEK and stored as wrappedMek + wrappedMekIv on the integration
  • Unwrapped at request time to encrypt/decrypt content
  • Each user can have multiple MEKs (one per integration)

Per-Operation IV#

  • A fresh 12-byte random IV is generated for every encryption operation
  • Stored alongside the ciphertext (e.g., contentIv, titleIv, summaryIv)
  • Ensures that encrypting the same plaintext twice produces different ciphertext

Key Derivation Flow#

When an agent makes a request:

1. Agent sends: Authorization: Bearer exv_key_here
2. Server looks up the integration via key hash
3. Server retrieves wrappedMek and wrappedMekIv from the integration
4. Server unwraps MEK using SEK:
   - Decrypt wrappedMek using AES-256-GCM with SEK and wrappedMekIv
   - Result: plaintext MEK (32 bytes)
5. Server uses MEK to encrypt/decrypt content:
   - Encrypt: generate random IV, AES-256-GCM-encrypt(plaintext, MEK, IV)
   - Decrypt: AES-256-GCM-decrypt(ciphertext, MEK, storedIv)

What Gets Encrypted#

Memory Content#

FieldEncryptedIV Field
encryptedContentYescontentIv
encryptedSummaryYessummaryIv
memoryTypeNo--
importanceNo--
confidenceNo--
entitiesNo--
metadataNo--

Note Content#

FieldEncryptedIV Field
encryptedTitleYestitleIv
encryptedContentYescontentIv
encryptedTagsYestagsIv

Vault Names#

FieldEncryptedIV Field
encryptedNameYesnameIv

Messages#

FieldEncryptedIV Field
encryptedContentYescontentIv
FieldEncryptedIV Field
encryptedLabelYes (optional)labelIv

Conversation Turns#

FieldEncryptedIV Field
encryptedContentYescontentIv

Vault Documents (Settings)#

FieldEncryptedIV Field
encryptedContentYescontentIv

What Is NOT Encrypted#

The following fields are stored in plaintext for indexing, filtering, and querying:

  • IDs -- All UUIDs (memory IDs, note IDs, vault IDs)
  • Types -- Memory types, relation types
  • Scores -- Importance, confidence, signal scores
  • Entities -- Entity arrays (for search filtering)
  • Metadata -- JSON metadata (task status, assigned agent, etc.)
  • Timestamps -- Created/updated dates
  • Blind index tokens -- HMAC-based tokens for privacy-preserving search
  • Content hashes -- SHA-256 hashes for deduplication
  • Embeddings -- Vector embeddings (derived from plaintext, cannot be reversed to exact content)

Blind Index Tokens#

For privacy-preserving keyword search without decrypting content:

  1. Content is tokenized into words/n-grams
  2. Each token is HMAC-signed with the MEK as the key
  3. Resulting hashes are stored as blindTokens on the memory
  4. Search queries are tokenized and HMAC-signed the same way
  5. Token overlap determines relevance without revealing plaintext

This enables the match_memories_by_blind_tokens database function to find relevant memories without accessing plaintext content.

Content Hash Deduplication#

For exact-match deduplication:

  1. Content + memory type are concatenated
  2. SHA-256 hash is computed
  3. Hash is stored as contentHash on the memory
  4. New writes check for hash collisions before inserting

This catches exact duplicates without comparing encrypted content.

Security Properties#

Zero-Knowledge at Rest#

The database never contains plaintext user content. All text fields (content, titles, tags, names, messages) are encrypted.

Per-Request Decryption#

Content is decrypted in memory during request processing and never written to disk in plaintext.

Key Isolation#

Each integration has its own wrapped MEK. Compromising one integration's key does not expose data from other integrations (even for the same user, if they have multiple integrations).

Forward Secrecy (Limited)#

If the SEK is rotated, existing wrapped MEKs can be re-wrapped with the new key. However, ExoVault does not currently implement automatic key rotation.

Limitations#

  • Embeddings are not encrypted -- Vector embeddings are stored in plaintext because pgvector requires raw vectors for similarity search. While embeddings cannot be trivially reversed to exact text, they do carry semantic information.
  • Metadata is not encrypted -- Task statuses, entities, and other metadata fields are stored in plaintext for filtering and indexing.
  • No end-to-end encryption -- The server has access to plaintext during request processing. This is a trade-off for server-side embedding generation and search.
  • Single SEK -- All data is protected by one server encryption key. If compromised, all data is at risk.