TTU — Response-Aware Inference Router

TTU Router is a provider-agnostic AI inference routing proxy developed by Fidelity Horizon that measures model quality on each response and automatically routes queries to the most cost-effective model. Like an automatic gearbox for AI — the right gear for every question, based on each response, not manual rules.

The right gear for every question

Like an automatic gearbox that selects the right gear for every driving condition: easy queries stay in low gear, handled quickly by an efficient model. Complex queries shift up, escalated to a powerful model. TTU measures the model's own quality signal on each response to make this decision. No manual rules, no guesswork.

Every existing inference router lets users or developers choose which model handles each query, by price, latency, or manual rules. TTU is different: it measures quality after the response, not before.

Response-aware routing

Each response is assessed individually using a proprietary quality estimation method. If quality is high, the efficient model's answer is used. If not, the query is escalated to a more powerful model. No manual rules needed.

Drop-in proxy

Provider-agnostic API proxy. Compatible with OpenAI, Anthropic, and any LLM API. One line change in your application code. Supports streaming and all standard parameters.

Safety routing

For safety-critical domains (medical, legal, financial), TTU includes domain detection and quality monitoring. Queries flagged as safety-critical are routed through additional verification before reaching the user.

Routing dashboard

See which model answers each query, why it was routed that way, and cumulative cost savings. Full audit trail for every routing decision.

1,000 queries, verified results

MMLU benchmark. TTU routes queries between a small and large model. Results from a single statistically robust run.

Quality retained
99.8%
Cost reduction
51%
Queries tested
1,000
Routing overhead
0.16μs

Six verified scenarios across different query types and model pairs. Safety routing adds domain detection and quality monitoring. 28 tests passing (9 proxy + 10 safety + 9 validation).

Routing based on the response, not the prompt

Existing inference routers let users or rules decide which model handles each query, by price, latency, or provider. TTU measures the model's own quality signal on each response.

ApproachHow it worksAssesses response?
API gatewaysMulti-provider proxy, user selects modelNo, manual selection
Open-source proxiesUnified API layer with routing rulesNo, rule-based
Observability platformsLogging, monitoring, cost trackingNo, monitoring only
Provider auto-routingVendor-native model selectionPartial, limited to own models
TTUResponse-aware routing + safetyYes, proprietary quality assessment

TTU Router — Common questions

How does TTU Router reduce LLM inference costs?

TTU Router sits as a proxy between your application and LLM providers. It routes each query to the most cost-effective model by measuring the response quality after generation. Simple queries are handled by cheaper models; complex queries are escalated only when needed. Verified: 51% cost reduction at 99.8% quality.

What makes TTU different from other inference routers?

Existing routers let users or rules select which model handles each query. TTU is the only router that measures model quality on each individual response to make routing decisions. Routing adapts to actual difficulty, not predicted difficulty.

More questions about TTU and our other products →