Drop in a document. Ask a question. Get a grounded answer.
Voxell Answers turns a folder of documents into answers you can cite — in one API call. Upload, and it auto-chunks, embeds, and indexes. Then ask a question and get an answer grounded in your own text, with the sources it came from. No pipeline to build, no vector DB to stand up, no framework to wire together.
Voxell Answers is the retrieval and question-answering layer of Voxell Forge. You give it documents; it chunks them, embeds them with Forge's models, and stores the vectors. When a question comes in, Voxell Answers retrieves the most relevant passages and — if you ask it to — hands them to Gemini to write a grounded answer with citations back to your source text.
The whole point is that there's nothing to assemble. Most teams build RAG by gluing together an embedding API, a vector database, a chunker, a reranker, and an LLM call — and then spend weeks tuning the seams. With Voxell Answers, you upload a document and immediately query it. The retrieval quality that usually takes a team months to reach is the default.
Load a document. Ask it anything.
Sign in once — your free 10M-token grant covers it — then drop in some text in the dashboard (release notes, a contract, a wiki page) and ask a question. You'll get an answer grounded in what you loaded, with the passage it came from. This is the real engine, not a canned demo.
Open Voxell Answers in your dashboard
Upload a document and query it against the live engine — free with your 10M-token grant.
Open Voxell Answers
Same corpus, same endpoints — only ?tier= changes.
Start free and fast on Edge; switch to Precision when exact names, codes, and nuance matter.
- Search IVF (Cloudflare Vectorize)
- Embeddings Turbo · 1024d
- Query cost 1,000 token-equiv
- Availability All plans, incl. free
- Search HNSW + BM25 hybrid (RRF)
- Embeddings Pro · 2560d (Qwen3-4B)
- Store pgvector 0.8 · halfvec
- Extras VQS quality scoring
Voxell Answers' Precision engine is built on the same embedding models that put Voxell at the top of the MTEB leaderboard — the retrieval quality is the embeddings, and the embeddings are measured.
2560 dimensions — from Voxell's
hard-negative-tuned embedding lineage that ranks at the top of MTEB. Retrieval quality is bounded by your
embedding model; we lead with the part that's benchmarked.
pgvector 0.8 as half-precision halfvec for compact,
fast cosine search over an HNSW index. pgvector 0.8 iterative scan returns a full top-k even
under tenant + corpus filters — no silent under-retrieval.
RRF (reciprocal rank fusion) — the recall of semantics
with the precision of exact match.
Two calls: load, then ask.
Ingest a document, then query it in answer mode. The
response is a grounded answer plus the chunks it cited — no chunking library, no vector store, no
orchestration. The corpus is created on first upload.
# 1. load a document (auto-chunks + embeds) curl https://api.voxell.ai/v1/wield/handbook/documents \ -H "Authorization: Bearer $VOXELL_KEY" \ -d '{"name":"policy.md","text":"..."}' # 2. ask a question, get a grounded answer curl https://api.voxell.ai/v1/wield/handbook/query \ -H "Authorization: Bearer $VOXELL_KEY" \ -d '{"query":"How many vacation days?","mode":"answer"}' # => { "answer": "...", "chunks": [{ "doc_id": "...", # "chunk_index": 3, "score": 0.87, "text": "..." }] }
You pay to embed your documents once and a small flat toll per query — not a monthly rent on every vector sitting in storage. Sign in and the first 10M tokens are on us.
I can already drop a document into ChatGPT and ask about it. Why do I need this?
For one document and one question, a chat window is fine. It falls apart the moment you have more documents than fit in a prompt — a support team with 50,000 articles, a product that answers from a live knowledge base, a contract set spanning years. You can't paste all of it into a chat.
Voxell Answers pre-processes your whole corpus into searchable meaning once, then retrieves only the few passages that matter when a question comes in. The LLM sees the right context instead of everything — faster, cheaper, and it actually scales. And because retrieval quality is the ceiling on answer quality, Voxell Answers is built on embeddings that are benchmarked to be good.
What's the difference between the Edge and Precision engines?
Edge is fast, free, and great for everyday documents and broad questions — dense vector search on 1024-dimension embeddings. Precision is the accurate one: larger 2560-dimension embeddings, an HNSW index, and hybrid search that adds exact lexical matching for codes and identifiers, plus corpus quality scoring.
They share one API and one corpus model — you can start on Edge and move to Precision by changing a single tier parameter, no re-integration.
What does "hybrid retrieval" actually do for me?
Pure semantic (dense) search is great at meaning but can blur exact tokens — it'll happily treat v2.1.4 and v2.4.1 as close. Pure keyword search nails exact strings but misses paraphrases.
Hybrid runs both and fuses the rankings. When your query contains an identifier — a SKU, an error code, a version — Precision adds a lexical pass so the exact match surfaces, while still using semantics for everything else. You get recall and precision without choosing.
Where does the answer come from — can I trust it?
In answer mode, Voxell Answers retrieves the most relevant chunks from your documents and asks Gemini to answer using only those, returning the source chunks alongside the answer. Every answer is traceable to the passage it came from. If nothing relevant is found, you get told that — not a confident guess.
Load a document. Ask it a question. See for yourself.
Free to start with a 10M-token grant, no card required.
or Try the live demo →