Encryption

ExoVault uses a zero-knowledge encryption architecture where all user content is encrypted before being stored in the database. The server can encrypt and decrypt on behalf of agents but never stores plaintext content at rest.

Algorithm#

AES-256-GCM (Advanced Encryption Standard with 256-bit keys in Galois/Counter Mode)

Key size: 256 bits (32 bytes)
IV size: 96 bits (12 bytes), randomly generated per encryption operation
Authentication tag: 128 bits (16 bytes), included in ciphertext
Mode: GCM provides both confidentiality and authentication (AEAD)

Key Hierarchy#

ExoVault uses a three-level key hierarchy:

Server Encryption Key (SEK)
    │
    ├── User Master Encryption Key (MEK)
    │       │
    │       ├── Per-operation IV + AES-256-GCM
    │       │       → Encrypted memory content
    │       │       → Encrypted note content
    │       │       → Encrypted vault names
    │       │       → Encrypted message content
    │       │       → Encrypted link labels
    │       │       → Encrypted summaries
    │       │       ...

Server Encryption Key (SEK)#

Stored as the ENCRYPTION_KEY environment variable
32-byte hex string (64 hex characters)
Used to wrap/unwrap per-integration MEKs
If lost, all encrypted data is unrecoverable
Must be backed up securely and never exposed

Master Encryption Key (MEK)#

Generated per agent integration
Wrapped (encrypted) with the SEK and stored as wrappedMek + wrappedMekIv on the integration
Unwrapped at request time to encrypt/decrypt content
Each user can have multiple MEKs (one per integration)

Per-Operation IV#

A fresh 12-byte random IV is generated for every encryption operation
Stored alongside the ciphertext (e.g., contentIv, titleIv, summaryIv)
Ensures that encrypting the same plaintext twice produces different ciphertext

Key Derivation Flow#

When an agent makes a request:

1. Agent sends: Authorization: Bearer exv_key_here
2. Server looks up the integration via key hash
3. Server retrieves wrappedMek and wrappedMekIv from the integration
4. Server unwraps MEK using SEK:
   - Decrypt wrappedMek using AES-256-GCM with SEK and wrappedMekIv
   - Result: plaintext MEK (32 bytes)
5. Server uses MEK to encrypt/decrypt content:
   - Encrypt: generate random IV, AES-256-GCM-encrypt(plaintext, MEK, IV)
   - Decrypt: AES-256-GCM-decrypt(ciphertext, MEK, storedIv)

What Gets Encrypted#

Memory Content#

Field	Encrypted	IV Field
`encryptedContent`	Yes	`contentIv`
`encryptedSummary`	Yes	`summaryIv`
`memoryType`	No	--
`importance`	No	--
`confidence`	No	--
`entities`	No	--
`metadata`	No	--

Note Content#

Field	Encrypted	IV Field
`encryptedTitle`	Yes	`titleIv`
`encryptedContent`	Yes	`contentIv`
`encryptedTags`	Yes	`tagsIv`

Vault Names#

Field	Encrypted	IV Field
`encryptedName`	Yes	`nameIv`

Messages#

Field	Encrypted	IV Field
`encryptedContent`	Yes	`contentIv`

Knowledge Graph Links#

Field	Encrypted	IV Field
`encryptedLabel`	Yes (optional)	`labelIv`

Conversation Turns#

Field	Encrypted	IV Field
`encryptedContent`	Yes	`contentIv`

Vault Documents (Settings)#

Field	Encrypted	IV Field
`encryptedContent`	Yes	`contentIv`

What Is NOT Encrypted#

The following fields are stored in plaintext for indexing, filtering, and querying:

IDs -- All UUIDs (memory IDs, note IDs, vault IDs)
Types -- Memory types, relation types
Scores -- Importance, confidence, signal scores
Entities -- Entity arrays (for search filtering)
Metadata -- JSON metadata (task status, assigned agent, etc.)
Timestamps -- Created/updated dates
Blind index tokens -- HMAC-based tokens for privacy-preserving search
Content hashes -- SHA-256 hashes for deduplication
Embeddings -- Vector embeddings (derived from plaintext, cannot be reversed to exact content)

For privacy-preserving keyword search without decrypting content:

Content is tokenized into words/n-grams
Each token is HMAC-signed with the MEK as the key
Resulting hashes are stored as blindTokens on the memory
Search queries are tokenized and HMAC-signed the same way
Token overlap determines relevance without revealing plaintext

This enables the match_memories_by_blind_tokens database function to find relevant memories without accessing plaintext content.

Content Hash Deduplication#

For exact-match deduplication:

Content + memory type are concatenated
SHA-256 hash is computed
Hash is stored as contentHash on the memory
New writes check for hash collisions before inserting

This catches exact duplicates without comparing encrypted content.

Security Properties#

Zero-Knowledge at Rest#

The database never contains plaintext user content. All text fields (content, titles, tags, names, messages) are encrypted.

Per-Request Decryption#

Content is decrypted in memory during request processing and never written to disk in plaintext.

Key Isolation#

Each integration has its own wrapped MEK. Compromising one integration's key does not expose data from other integrations (even for the same user, if they have multiple integrations).

Forward Secrecy (Limited)#

If the SEK is rotated, existing wrapped MEKs can be re-wrapped with the new key. However, ExoVault does not currently implement automatic key rotation.

Limitations#

Embeddings are not encrypted -- Vector embeddings are stored in plaintext because pgvector requires raw vectors for similarity search. While embeddings cannot be trivially reversed to exact text, they do carry semantic information.
Metadata is not encrypted -- Task statuses, entities, and other metadata fields are stored in plaintext for filtering and indexing.
No end-to-end encryption -- The server has access to plaintext during request processing. This is a trade-off for server-side embedding generation and search.
Single SEK -- All data is protected by one server encryption key. If compromised, all data is at risk.

Configuration -- ENCRYPTION_KEY setup
Authentication -- Wrapped MEK and agent keys
Embedding Model -- How embeddings are generated
Database -- Schema details

Search Documentation