Skip to main content

LLM Router

Selects the optimal LLM model for each task. The single biggest cost lever in multi-agent systems — intelligent routing saves 45-85% while maintaining 95%+ of top-model quality.


When to Use

Use for:

  • Deciding which model to call for a specific task
  • Assigning models to DAG nodes in agent workflows
  • Optimizing LLM API costs across a system
  • Building cascading try-cheap-first patterns

NOT for:

  • Prompt engineering (use prompt-engineer)
  • Model fine-tuning or training
  • Comparing model architectures (academic research)

Routing Decision Tree

flowchart TD
A{Task type?} -->|Classify / validate / format / extract| T1["Tier 1: Haiku, GPT-4o-mini (~$0.001)"]
A -->|Write / implement / review / synthesize| T2["Tier 2: Sonnet, GPT-4o (~$0.01)"]
A -->|Reason / architect / judge / decompose| T3["Tier 3: Opus, o1 (~$0.10)"]

T1 --> Q1{Quality sufficient?}
Q1 -->|Yes| Done1[Use cheap model]
Q1 -->|No| T2

T2 --> Q2{Quality sufficient?}
Q2 -->|Yes| Done2[Use balanced model]
Q2 -->|No| T3

Tier Assignment Table

Task TypeTierModelsCost/CallWhy This Tier
Classify input type1Haiku, GPT-4o-mini~$0.001Deterministic categorization
Validate schema/format1Haiku, GPT-4o-mini~$0.001Mechanical checking
Format output / template1Haiku, GPT-4o-mini~$0.001Structured transformation
Extract structured data1Haiku, GPT-4o-mini~$0.001Pattern matching
Summarize text1-2Haiku → Sonnet~$0.001-0.01Short summaries: Haiku; nuanced: Sonnet
Write content/docs2Sonnet, GPT-4o~$0.01Creative quality matters
Implement code2Sonnet, GPT-4o~$0.01Correctness + style
Review code/diffs2Sonnet, GPT-4o~$0.01Needs judgment, not just pattern matching
Research synthesis2Sonnet, GPT-4o~$0.01Multi-source reasoning
Decompose ambiguous problem3Opus, o1~$0.10Requires deep understanding
Design architecture3Opus, o1~$0.10Complex system reasoning
Judge output quality3Opus, o1~$0.10Meta-reasoning about quality
Plan multi-step strategy3Opus, o1~$0.10Long-horizon planning

Three Routing Strategies

Strategy 1: Static Tier Assignment (Start Here)

Assign model by task type at DAG design time. No runtime logic. Gets 60-70% of possible savings.

nodes:
- id: classify
model: claude-haiku-4-5 # Tier 1: $0.001
- id: implement
model: claude-sonnet-4-5 # Tier 2: $0.01
- id: evaluate
model: claude-opus-4-5 # Tier 3: $0.10

Strategy 2: Cascading (Try Cheap First)

Try the cheap model; if quality is below threshold, escalate. Adds ~1s latency but saves 50-80% on nodes where cheap succeeds.

1. Execute with Tier 1 model
2. Quick quality check (also Tier 1 — costs ~$0.001)
3. If quality ≥ threshold → done
4. If quality < threshold → re-execute with Tier 2

Best for nodes where you're genuinely unsure which tier is needed.

Strategy 3: Adaptive (Learn from History)

Record success/failure per task type per model. Over time, the router learns:

  • "Classification nodes always succeed on Haiku" → stay cheap
  • "Code review nodes fail on Haiku 40% of the time" → upgrade to Sonnet
  • "Architecture nodes succeed on Sonnet 90% of the time" → don't need Opus

Gets 75-85% savings after ~100 executions of training data.


Provider Selection

Once model tier is chosen, select the provider:

Model ClassProvider OptionsSelection Criteria
Haiku-classAnthropic, AWS BedrockLatency, regional availability
Sonnet-classAnthropic, AWS Bedrock, GCP VertexCost, rate limits
Opus-classAnthropicOnly provider
GPT-4o-classOpenAI, Azure OpenAIRate limits, compliance
Open-sourceOllama (local), Together.ai, FireworksCost ($0), latency, GPU availability

Cost Impact Example

10-node DAG, "refactor a codebase":

StrategyMixCostSavings
All Opus10× $0.10$1.00
All Sonnet10× $0.01$0.1090%
Static tiers4× Haiku + 4× Sonnet + 2× Opus$0.2476%
Cascading6× Haiku + 3× Sonnet + 1× Opus$0.1486%
Adaptive (trained)Dynamic~$0.0892%

Anti-Patterns

Always Use the Best Model

Wrong: Route everything to Opus/o1 "for quality." Reality: 60%+ of typical DAG nodes are classification, validation, or formatting — tasks where Haiku performs identically to Opus. You're burning money.

Always Use the Cheapest Model

Wrong: Route everything to Haiku "for cost." Reality: Complex reasoning, architecture design, and quality judgment genuinely need stronger models. Haiku will produce plausible-looking but subtly wrong output on hard tasks.

Ignoring Latency

Wrong: Only optimizing for cost, ignoring that Opus takes 5-10x longer than Haiku. Reality: In a 10-node DAG, model choice affects total execution time as much as cost. Route time-critical paths to faster models.

No Feedback Loop

Wrong: Setting model tiers once and never adjusting. Reality: As models improve (Haiku gets smarter every generation), tasks that needed Sonnet last month may work on Haiku today. Record outcomes and adapt.