Some months ago I walked away from a stable career and put every dollar I could spare into GPUs. This was not a metaphor. I bought actual hardware with my own money because I believed I could build a better embedding engine than the ones shipped by the largest labs in the world.
Building something extraordinary is not free. You can be the best in the world at balancing the load and a two-hundred-pound pack still changes how you walk. I picked it up anyway, with my eyes open.
Eight weeks and one cluster later, here is where that bet stands: Ingot-8B-R3 is #1 on MTEB(eng, v2). You do not have to take my word for it. The leaderboard is public, and the scores are fully verifiable.
The Landscape of the Frontier
For any developer, enterprise architect, or investor looking at the modern landscape of language representation, the top of the English MTEB leaderboard tells a very specific story. It is dominated by massive, institutional research groups and heavily funded engineering teams.
Before today, the state of the art in English embeddings was almost entirely held by foreign research laboratories with large computing clusters and sprawling teams of PhDs.
Our result changes that map. The model that now tops it was built entirely in the United States, by a solo engineer, on a single private cluster.
We achieved this not by out-spending the giants on raw scale, but by out-architecting them on efficiency.
To protect this technological advantage, we have filed utility patents covering both our runtime dynamic routing architecture and our offline training data-generation pipeline.
The Core Technical Thesis
Most modern embedding research is chasing a single, expensive variable: scale. The industry assumption is that to capture the semantic nuances of complex human language, you must train increasingly massive dense models on larger clusters with broader datasets.
We believe that approach has run into diminishing returns. The actual leverage point is not raw parameter scale. It is routing.
Ingot-8B-R3 is built on a proprietary routing architecture. Instead of forcing a single monolithic set of parameters to represent every task in English text, our system dynamically activates specialized sub-networks at inference time based entirely on the input content itself. No task-specific metadata or manual indexing is required.
This approach allowed a solo developer on a single cluster to outpace the specialized research divisions of major technology institutions.
[Input Content]
│
▼
┌──────────────────────────────────────┐
│ Voxell Dynamic Routing Engine │
│ (Proprietary, Patents Filed) │
└──────────────────┬───────────────────┘
│
┌──────────────────┼───────────────────┐
▼ ▼ ▼
[Specialist A] [Specialist B] [Specialist C]
│ │ │
└──────────────────┼───────────────────┘
│
▼
[Unified 4096-dim Vector]
Performance Analysis
We evaluated Ingot-8B-R3 on the Massive Text Embedding Benchmark (MTEB) English v2 suite.
Mean (Task) Score: 75.99. That is a +0.76 overall delta over the foundational Qwen3-Embedding-8B model.
Borda Points: 5,567. Borda scoring ranks every model in the evaluation cohort across all 41 distinct tasks, summing those rank positions. This metric is a direct test of consistency. It cannot be gamed by over-optimizing for a handful of retrieval tasks while collapsing on clustering or classification.
We submitted Ingot-8B-R3 as a served API model, aligning with the deployment and registry patterns of other closed-weight platforms on the leaderboard.
Categorical Breakdown
| Category | Tasks | Score |
|---|---|---|
| Classification | 8 | 90.41 |
| STS | 9 | 89.32 |
| Pair Classification | 4 | 87.66 |
| Retrieval | 10 | 70.01 |
| Clustering | 8 | 58.47 |
| Summarization | 1 | 36.96 |
| Reranking | 1 | 32.84 |
Reranking and Summarization are single-task categories in MTEB(eng, v2): those scores reflect one dataset each, not categorical averages. Our performance confirms that structured routing does not just scale performance peaks. It raises the floor across the entire task distribution.
A Benchmark is Not a Product
Topping MTEB proves that we can build embedding systems at the global frontier. It does not solve real-world retrieval.
In production, retrieval pipelines do not fail because they misinterpret clean, isolated sentences. They fail because enterprise data is messy, highly structured, and structurally interdependent. The real-world challenges that break systems in production (large corpora, nested tables, source code repositories, abstract syntax trees) are completely unmeasured by standard public benchmarks.
The benchmark is the proof. Forge is the point.
We did not build Ingot to sell it as a production API. We built Ingot as our research instrument to prove the soundness of the core routing and data-generation technologies that we have engineered into Forge, our enterprise retrieval engine.
What Forge Is
Forge is our production embedding engine and retrieval platform. It is engineered to bring the quality of frontier-scale embeddings into production environments under strict latency, cost, and security constraints.
Tiered Efficiency. Forge does not force you to run an expensive, high-latency 8B model. It offers a tiered strategy across Forge Turbo (1024-dim), Forge Pro (2560-dim), and Forge Ultra (4096-dim).
Production Latency. Optimized with a binary Protobuf embedding path designed to support sub-100ms end-to-end agentic retrieval.
Zero-Trust Security. Fully protected by mutual-TLS zero-trust identity architectures for agent and service deployments.
Structure Preservation. Engineered specifically to retain the semantic boundaries of non-linear data structures, including hierarchical financial tables and code syntax trees.
Try It Now
We believe that developers should verify capability using their own data, not a marketing document.
We have launched a completely free, zero-signup playground where you can paste your own text, upload complex unstructured documents, and immediately test retrieval accuracy.
- Try the Playground: no signup required.
- Start Building: new accounts receive a developer grant of 10,000,000 free tokens.
- Drop-In Integration: Forge uses a fully OpenAI-compatible API. Swapping your existing embedding provider requires changing exactly two lines of code.