Engineering Insights

The Cache Feedback Gap: Why Your Prefetcher Doesn't Learn

Jonathan Corners | January 2026

Your retrieval cache predicts what to preload and never learns if it was right. The open-loop problem, and what closing the feedback loop unlocks.

The Cache That Guesses and Forgets

Your retrieval pipeline caches embeddings. It has to: recomputing a vector for a document you’ve already indexed is wasteful, and at scale, it’s fatal to latency. So you cache. The cache warms up. You congratulate yourself.

Then you look more carefully. Your cache doesn’t know which of its entries were ever retrieved. It doesn’t know which cached vectors are still semantically correct versus stale from a model update. It doesn’t know which prefetched entries were evicted without being used.

The cache is guessing. And it has no way to find out if its guesses were right.

This is the cache feedback gap. It’s not specific to embeddings. It’s a structural problem that runs across every prefetching system in the stack.

The Prefetch Graveyard

Consider how prefetching fails across the stack:

HTTP/2 Server Push: The server pushes assets it predicts the client will need: JavaScript bundles, CSS, fonts. The client already has them cached. 500KB sent, discarded. The server does the same thing on the next request. Chrome dropped Server Push support in 2022 because the absence of feedback made it more harmful than useful.

CDN prefetching: CDNs analyze access patterns and speculatively warm edge caches. When they prefetch assets that never get requested, the only signal is a cache eviction log that nobody reads. The prefetcher never adapts.

Database query caching: Query caches store results for repeated queries. They don’t anticipate related queries. Fetching a user record doesn’t pre-warm the cache for that user’s recent documents, even if the access pattern is predictable. The cache waits to be asked.

Embedding prefetch: You predict that users searching topic A will also search topic B. You preload B’s embedding vectors. If the prediction was wrong, those vectors occupy cache until eviction. You never know how often you were wrong.

All of these systems share one structural flaw: they’re open-loop. They make predictions. They don’t learn from outcomes.

What’s Missing: The Feedback Signal

Imagine if your prefetcher worked like a recommendation system. Every prediction would have an identifier. Every outcome would be tracked. The system would learn which predictions were useful and which were waste.

The missing piece is a standardized feedback signal, one that can report:

Outcome Meaning
HIT The prefetched data was used. Latency was saved.
MISS The data wasn’t prefetched, but was subsequently needed.
EVICTED_UNUSED The data was prefetched, cached, and evicted without being accessed.
STALE_HIT The prefetched data was used but was stale.

The EVICTED_UNUSED signal is the most valuable. It identifies pure waste: bandwidth consumed, cache space occupied, no benefit. A prefetcher that sees repeated EVICTED_UNUSED outcomes for a pattern should stop predicting it. An open-loop prefetcher never receives this signal and keeps making the same mistake indefinitely.

With this taxonomy, a predictive cache can correlate every proactive decision with its outcome. Patterns that produce HITs get reinforced. Patterns that produce EVICTED_UNUSED get suppressed. The cache improves over time instead of running on static rules.

Closing the Loop

The open-loop problem has a straightforward structural fix: give every prediction an identifier, and provide a mechanism for the outcome to be reported against that identifier.

This is the core principle behind the HINT protocol: predictions are tagged, outcomes flow back, the prediction model adapts. It’s transport-agnostic: the same pattern works over HTTP headers, GraphQL extensions, or gRPC metadata. The mechanism is simple. The value is in building the feedback path as a first-class concern rather than an afterthought.

ARC

ARC is Voxell’s closed-loop vector cache, built on this model. Every cache operation produces a prediction identifier. Outcomes (HIT, MISS, EVICTED_UNUSED) flow back and feed the prediction model. ARC adapts across requests, not just within them.

There’s a structural prerequisite that most caching systems ignore: the feedback loop only works if your embeddings are reproducible. If the same document produces different vectors across service restarts, library upgrades, or hardware failovers, your prediction identifiers become unreliable. A HIT and a MISS for semantically identical queries will correspond to different vector representations. The correlation between prediction and outcome is broken before any learning can happen.

This is why ARC is built on deterministic embeddings. Bit-exact vector computation is not a nice-to-have for a closed-loop cache. It’s a correctness requirement. Without it, you’re teaching the prediction model with a corrupted training signal.

The cache feedback gap has persisted because feedback was treated as optional. ARC closes it by treating feedback as the point.

ARC is part of Voxell’s infrastructure layer. The HINT specification is available at /hint/. ARC and ART implement HINT natively.

Author

Jonathan Corners - Founder, Voxell. I build GPU-native infrastructure for real-time AI systems.

If you're working on latency + consistency problems, I'd like to hear about it.

Contact 24h reply • NDA ok • No IP needed

Ready to see this in practice?

Get hands-on with Voxell Coherence.

Request Access