TTU — Response-Aware Inference Router
TTU Router is a provider-agnostic AI inference routing proxy developed by Fidelity Horizon that measures model quality on each response and automatically routes queries to the most cost-effective model. Like an automatic gearbox for AI — the right gear for every question, based on each response, not manual rules.
The right gear for every question
Like an automatic gearbox that selects the right gear for every driving condition: easy queries stay in low gear, handled quickly by an efficient model. Complex queries shift up, escalated to a powerful model. TTU measures the model's own quality signal on each response to make this decision. No manual rules, no guesswork.
Every existing inference router lets users or developers choose which model handles each query, by price, latency, or manual rules. TTU is different: it measures quality after the response, not before.
Response-aware routing
Each response is assessed individually using a proprietary quality estimation method. If quality is high, the efficient model's answer is used. If not, the query is escalated to a more powerful model. No manual rules needed.
Drop-in proxy
Provider-agnostic API proxy. Compatible with OpenAI, Anthropic, and any LLM API. One line change in your application code. Supports streaming and all standard parameters.
Safety routing
For safety-critical domains (medical, legal, financial), TTU includes domain detection and quality monitoring. Queries flagged as safety-critical are routed through additional verification before reaching the user.
Routing dashboard
See which model answers each query, why it was routed that way, and cumulative cost savings. Full audit trail for every routing decision.
1,000 queries, verified results
MMLU benchmark. TTU routes queries between a small and large model. Results from a single statistically robust run.
Six verified scenarios across different query types and model pairs. Safety routing adds domain detection and quality monitoring. 28 tests passing (9 proxy + 10 safety + 9 validation).
Routing based on the response, not the prompt
Existing inference routers let users or rules decide which model handles each query, by price, latency, or provider. TTU measures the model's own quality signal on each response.
| Approach | How it works | Assesses response? |
|---|---|---|
| API gateways | Multi-provider proxy, user selects model | No, manual selection |
| Open-source proxies | Unified API layer with routing rules | No, rule-based |
| Observability platforms | Logging, monitoring, cost tracking | No, monitoring only |
| Provider auto-routing | Vendor-native model selection | Partial, limited to own models |
| TTU | Response-aware routing + safety | Yes, proprietary quality assessment |
TTU Router — Common questions
How does TTU Router reduce LLM inference costs?
TTU Router sits as a proxy between your application and LLM providers. It routes each query to the most cost-effective model by measuring the response quality after generation. Simple queries are handled by cheaper models; complex queries are escalated only when needed. Verified: 51% cost reduction at 99.8% quality.
What makes TTU different from other inference routers?
Existing routers let users or rules select which model handles each query. TTU is the only router that measures model quality on each individual response to make routing decisions. Routing adapts to actual difficulty, not predicted difficulty.