VOXELL ANSWERS · DROP-IN RAG ENGINE

Drop in a document. Ask a question. Get a grounded answer.

Voxell Answers turns a folder of documents into answers you can cite — in one API call. Upload, and it auto-chunks, embeds, and indexes. Then ask a question and get an answer grounded in your own text, with the sources it came from. No pipeline to build, no vector DB to stand up, no framework to wire together.

answers — ask
# upload your docs, then just ask
> Which plan includes the Precision engine?
answer
The Precision engine is available on the Singularity and Enterprise plans. [architecture.md · chunk 2]
retrieved from
6 documents · 13 chunks · hybrid dense + BM25
What is Voxell Answers

Voxell Answers is the retrieval and question-answering layer of Voxell Forge. You give it documents; it chunks them, embeds them with Forge's models, and stores the vectors. When a question comes in, Voxell Answers retrieves the most relevant passages and — if you ask it to — hands them to Gemini to write a grounded answer with citations back to your source text.

The whole point is that there's nothing to assemble. Most teams build RAG by gluing together an embedding API, a vector database, a chunker, a reranker, and an LLM call — and then spend weeks tuning the seams. With Voxell Answers, you upload a document and immediately query it. The retrieval quality that usually takes a team months to reach is the default.

Try it live

Load a document. Ask it anything.

Sign in once — your free 10M-token grant covers it — then drop in some text in the dashboard (release notes, a contract, a wiki page) and ask a question. You'll get an answer grounded in what you loaded, with the passage it came from. This is the real engine, not a canned demo.

dash.voxell.ai/wield

Open Voxell Answers in your dashboard

Upload a document and query it against the live engine — free with your 10M-token grant.

Open Voxell Answers
How it works — Mine → Forge → Wield
01 / MINE
Bring your knowledge in
Upload text, files, or up to 256 documents in a single batch call. Voxell Answers splits them into clean, paragraph-aware chunks and embeds every one — no pre-chunking, no separate embedding step. Re-upload the same file and it's recognized, not duplicated.
02 / FORGE
Shape what stays
Organize documents into corpora, rename and prune chunks, and score corpus quality with VQS to find the documents that hurt retrieval. Curate the knowledge that answers questions; drop the noise.
03 / WIELD
Put it to work
Query in two modes: vectors returns the ranked passages; answer returns a Gemini-written response grounded in those passages, with citations back to the source chunk. One endpoint, one call.
Two engines. One API.

Same corpus, same endpoints — only ?tier= changes. Start free and fast on Edge; switch to Precision when exact names, codes, and nuance matter.

Edge
Fast & economical — for everyday documents and broad questions. Free on every plan.
  • Search IVF (Cloudflare Vectorize)
  • Embeddings Turbo · 1024d
  • Query cost 1,000 token-equiv
  • Availability All plans, incl. free
Precision · SOTA
Most accurate — for exact names, codes & nuance. Hybrid retrieval + quality scoring.
  • Search HNSW + BM25 hybrid (RRF)
  • Embeddings Pro · 2560d (Qwen3-4B)
  • Store pgvector 0.8 · halfvec
  • Extras VQS quality scoring
Why Voxell Answers
Drop-in
From raw doc to grounded answer in one call
Auto-chunking, embedding, retrieval, and the grounded answer are one product, not five services you integrate. Upload, then query. Citations come back with the answer — full provenance to the source chunk, no post-processing.
Ingest → query, zero glue code
Hybrid retrieval
Semantic when you mean it, exact when you need it
On Precision, Voxell Answers detects identifier-bearing queries — part numbers, error codes, version strings — and fuses dense vector search with BM25 lexical search via reciprocal rank fusion. Paraphrases stay semantic; exact tokens get matched exactly. No tuning, no config.
Dense + BM25 · RRF fusion
Quality you can see
A score for how well your corpus will retrieve
VQS (Voxell Quality Score) reads your stored chunks and flags the documents dragging retrieval down — near-duplicates, incoherent splits, junk attractors — with a per-document score and a 2D quality map. Stop guessing why a query missed.
Per-doc score · 2D quality map
Under the hood

Voxell Answers' Precision engine is built on the same embedding models that put Voxell at the top of the MTEB leaderboard — the retrieval quality is the embeddings, and the embeddings are measured.

SPEC // 01
MTEB-leading embeddings
Precision embeds with Pro (Qwen3-4B) at 2560 dimensions — from Voxell's hard-negative-tuned embedding lineage that ranks at the top of MTEB. Retrieval quality is bounded by your embedding model; we lead with the part that's benchmarked.
SPEC // 02
pgvector 0.8 · HNSW + halfvec
Vectors live in Postgres on pgvector 0.8 as half-precision halfvec for compact, fast cosine search over an HNSW index. pgvector 0.8 iterative scan returns a full top-k even under tenant + corpus filters — no silent under-retrieval.
SPEC // 03
Hybrid dense + lexical fusion
A dense HNSW pass always runs; a BM25 lexical pass kicks in for queries carrying codes or identifiers. The two ranked lists are merged with RRF (reciprocal rank fusion) — the recall of semantics with the precision of exact match.
The drop-in

Two calls: load, then ask.

Ingest a document, then query it in answer mode. The response is a grounded answer plus the chunks it cited — no chunking library, no vector store, no orchestration. The corpus is created on first upload.

Get an API key
quickstart.sh
# 1. load a document (auto-chunks + embeds)
curl https://api.voxell.ai/v1/wield/handbook/documents \
  -H "Authorization: Bearer $VOXELL_KEY" \
  -d '{"name":"policy.md","text":"..."}'

# 2. ask a question, get a grounded answer
curl https://api.voxell.ai/v1/wield/handbook/query \
  -H "Authorization: Bearer $VOXELL_KEY" \
  -d '{"query":"How many vacation days?","mode":"answer"}'

# => { "answer": "...", "chunks": [{ "doc_id": "...",
#       "chunk_index": 3, "score": 0.87, "text": "..." }] }
Pricing that doesn't tax idle data

You pay to embed your documents once and a small flat toll per query — not a monthly rent on every vector sitting in storage. Sign in and the first 10M tokens are on us.

Free to start
10M tokens on sign-in
Sign in with Google and get a 10M-token grant with no expiry — enough to load a real corpus and run thousands of queries before you pay a cent.
No card · no expiry
Pay per query
A flat toll, not metered reads
Each query is a flat token-equivalent toll — 1,000 on Edge, 8,000 on Precision — covering retrieval and the query embed. Predictable as you scale.
Edge 1k · Precision 8k / query
No storage rent
Idle vectors don't bill
There's no per-gigabyte, per-day charge on stored vectors. A large corpus you query occasionally doesn't quietly run up a bill the way per-storage pricing does.
Embed once · keep it
New to RAG?
I can already drop a document into ChatGPT and ask about it. Why do I need this?

For one document and one question, a chat window is fine. It falls apart the moment you have more documents than fit in a prompt — a support team with 50,000 articles, a product that answers from a live knowledge base, a contract set spanning years. You can't paste all of it into a chat.

Voxell Answers pre-processes your whole corpus into searchable meaning once, then retrieves only the few passages that matter when a question comes in. The LLM sees the right context instead of everything — faster, cheaper, and it actually scales. And because retrieval quality is the ceiling on answer quality, Voxell Answers is built on embeddings that are benchmarked to be good.

What's the difference between the Edge and Precision engines?

Edge is fast, free, and great for everyday documents and broad questions — dense vector search on 1024-dimension embeddings. Precision is the accurate one: larger 2560-dimension embeddings, an HNSW index, and hybrid search that adds exact lexical matching for codes and identifiers, plus corpus quality scoring.

They share one API and one corpus model — you can start on Edge and move to Precision by changing a single tier parameter, no re-integration.

What does "hybrid retrieval" actually do for me?

Pure semantic (dense) search is great at meaning but can blur exact tokens — it'll happily treat v2.1.4 and v2.4.1 as close. Pure keyword search nails exact strings but misses paraphrases.

Hybrid runs both and fuses the rankings. When your query contains an identifier — a SKU, an error code, a version — Precision adds a lexical pass so the exact match surfaces, while still using semantics for everything else. You get recall and precision without choosing.

Where does the answer come from — can I trust it?

In answer mode, Voxell Answers retrieves the most relevant chunks from your documents and asks Gemini to answer using only those, returning the source chunks alongside the answer. Every answer is traceable to the passage it came from. If nothing relevant is found, you get told that — not a confident guess.

Load a document. Ask it a question. See for yourself.

Free to start with a 10M-token grant, no card required.

or   Try the live demo →