Model Tiering

Preserve your frontier model quota for the work that actually needs it. Route everything else to cheaper tiers automatically.

Get Started Free View Docs

Your frontier model is doing janitorial work

The Problem

When a coding agent runs git status, reads a config file, or checks test output, it uses the same frontier model as when it's designing architecture or debugging race conditions. You're paying premium prices for tasks that any model can handle.

The Solution

Tokonomy's Limp-Home mode classifies every request as janitorial or frontier in real time. Janitorial tasks (file reads, git ops, doc lookups, formatting) route to the cheapest tier. Frontier tasks (architecture, debugging, security analysis) stay on your requested model.

Before and After

Without Tiering
Claude Sonnet for everything: read file → Sonnet git diff → Sonnet design schema → Sonnet check syntax → Sonnet 100% of spend on frontier model
With Model Tiering
Intelligent tiering: read file → Haiku (85% cheaper) git diff → Haiku (85% cheaper) design schema → Sonnet (stays) check syntax → Haiku (85% cheaper) ~70% of requests on economy tier
Token savings: 60%

How It Works

1

Enable Limp-Home mode in your application's profile

2

Each request is classified as janitorial or frontier in real time

3

Janitorial tasks route to the cheapest same-provider tier (Haiku, GPT-4o-mini, Gemini Flash)

4

Frontier tasks stay on your requested model. Classification defaults to frontier when uncertain

Frequently Asked Questions

What's the latency impact?
Fast pattern matching handles ~70% of classifications with zero added latency. The LLM fallback for ambiguous cases adds ~150ms.
Can I combine tiering with a local model?
Yes. With the Local Gateway, janitorial tasks route to Ollama on your machine for zero cloud cost. Frontier tasks forward to the cloud proxy.
Which models have tiering paths?
Claude Opus/Sonnet → Haiku. GPT-4o/4.5 → GPT-4o-mini. Gemini Pro → Gemini Flash.
Does tiering affect the response format?
No. Tiering stays within the same provider, so the response format is identical. Your tool can't tell the difference.

Related Tools

Ready to start saving?

Create an account, add your first app, and swap one URL. Takes about 5 minutes.

Get Started Free