suprbox
All posts
Launch

Introducing Suprbox Vault: chat with your secure documents

Suprbox Vault answers natural-language questions across every document in a vault, with citations and the same rule enforcement as a direct read.

Hritvik Aggarwal May 11, 2026 6 min read

Today we are shipping Suprbox Vault.

A vault in Suprbox is the place you put documents that an AI agent should be able to use, under your rules. Until now, an agent had to know which file it wanted, ask for that file by id, and read the full text back. That is fine when you already know where the answer lives. It falls apart the moment you do not. Questions like "where are the investor details" or "what is the total ARR across these contracts" forced the agent to fetch every document one by one and stitch the answer together itself.

Suprbox Vault replaces that loop. Ask a question in natural language against an entire vault. Receive a single synthesized answer back, with citations pointing to the exact documents and pages the answer came from. Every rule you configured on the vault is still enforced. A document the agent would not have been allowed to read directly cannot contribute its content to the answer either.

The algorithm

The pipeline has six stages. There is no vector database anywhere in it.

1. Chunk on ingest

When a document is uploaded, the existing extractor pulls plain text and page boundaries into Postgres. Suprbox Vault then splits that text into roughly five-hundred-token passages with paragraph-aware boundaries. Tables stay intact as a single block so a cap table row is never severed mid-cell. Each chunk knows which document and which page it came from.

2. AI enrichment per chunk

A small batched call to the language model writes a one-sentence summary of every chunk and lists the named entities inside it: people, companies, dollar amounts, dates, project names. Both go into Postgres alongside the chunk text. This happens once at ingest, not at query time.

3. Keyword index, not embeddings

Postgres builds a full-text index over the summary and chunk text using its built-in BM25-style ranker (tsvector with a GIN index), and a separate inverted index over the entity list stored as a JSON column.

There is no embedding column anywhere. The choice is deliberate. On the kinds of documents that Suprbox customers actually store, where exact names and numbers matter more than fuzzy paraphrase, keyword recall combined with a strong reranker beats cosine similarity on raw embeddings, and is dramatically simpler to operate. Postgres alone handles the entire lookup. No vector store to manage, no embedding model to keep in sync, no dimension mismatches when a model is upgraded.

4. Hybrid recall at query time

When a question arrives, the language model first extracts entities and keywords from it. Two SQL queries then run in parallel against the index:

The first ranks chunks by BM25 score against the question text. The second ranks chunks by overlap between the question's entities and each chunk's entity list.

Both queries filter by user and by vault membership at the SQL layer, so cross-vault leakage is structurally impossible. The two ranked lists are merged with reciprocal rank fusion, which gives a fair combined ranking without having to tune relative weights between BM25 and entity overlap. The top forty candidates move forward.

5. Cross-encoder rerank with Cohere

Those forty candidates pass through Cohere's rerank-4-pro model via OpenRouter. A reranker is a cross-encoder: it reads the original question and each candidate passage together and produces a relevance score that reflects how well the passage actually answers the question. This is fundamentally different from a vector lookup, which can only compare two encodings independently.

The reranker is where most of the quality comes from. BM25 surfaces the right neighborhood. The reranker picks the actual answer out of that neighborhood. The top eight passages survive.

6. Synthesis with citations

The eight surviving chunks are grouped by document and handed to Grok 4.3, also via OpenRouter, with instructions to use only the provided excerpts. Each factual claim in the response carries a citation that points back to the chunk it came from. The citations are parsed out of the streamed response and returned as structured references the user interface renders as clickable chips.

The model is explicitly told not to invent data. If the surviving chunks do not contain an answer, the response says so honestly.

Rules still apply

The vault rules engine runs at two layers around the pipeline.

Before retrieval, the vault-level rules decide whether the request can run at all. Deny rules return a denial. Approval rules return a pending state until a human approves. Rate limits and session leases work exactly as they do for a direct document read.

After retrieval but before synthesis, every document that the reranker selected is evaluated individually. Any document that would be denied by a per-document rule, or that requires approval, or that is clamped to metadata-only access, has its chunks dropped from the answer. The agent never sees content from a document it would not have been allowed to read directly.

The synthesized answer then passes through the same content-shaping step as a direct read. If a redaction rule masks social-security numbers or credit-card numbers, the detectors run on the answer text and the same masking applies. The policy headers on the response report the combined decision across every cited document.

In practice this means a vault containing a mix of Confidential and Restricted documents will only ever surface Confidential content through Suprbox Vault to an agent whose key is allowed to see Confidential. The Restricted documents, even if they would have been the strongest matches for the question, are removed silently from the candidate set, and the audit log records that a query touched them.

A small example

Upload a Series A term sheet and a cap table to the same vault. Ask: "list every investor across all rounds with their amounts and equity percentages."

The response groups results by source document. From the term sheet you get Sequoia Capital, Andreessen Horowitz, and Index Ventures with their dollar amounts and equity percentages. From the cap table you get the same three on the Series A sheet, plus Y Combinator, Naval Ravikant, and SV Angel from the Seed sheet. Numbers and names match the source exactly. Each line carries a citation back to the chunk it came from.

If you then add a deny rule on the cap table for that agent, the same question returns only the term sheet's three names. The Seed round investors disappear from the answer, because the file that holds them is gated.

Why this design

Three properties drove the design.

Correctness on entity-heavy data. Real documents are full of named things: company names, dollar amounts, dates, contract numbers. Keyword recall plus a cross-encoder reranker preserves those literally. Embeddings tend to blur them. For cap tables, contracts, financial filings, and similar data, the right answer requires the exact name, not a paraphrase.

Operability. A keyword index lives in the same Postgres database as the rest of the application. There is one system to back up, one place to query, one set of migrations. Vector databases are operationally heavier and add a second moving part that has to stay consistent with the source of truth.

Policy as a first-class concern. Because the same retrieval pipeline runs inside the existing rule-evaluation flow, an agent's access to information through Suprbox Vault is bounded by exactly the same rules as its access through direct document reads. There is no shadow path. Anything an agent could not have read directly, it cannot extract through a query either.

Availability

Suprbox Vault is live for every vault now. Owners can ask questions of their own data from inside the dashboard. AI agents reach the same pipeline through the Suprbox SDK. Both paths run the same code, with the same rule enforcement and the same citation format.

Continue reading

Architecture
Agent keys, scopes, and session leases
7 min read
Architecture
How rules work in Suprbox
10 min read