model-routing-patterns

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Model Routing Patterns

模型路由模式

Production-ready model routing configurations and strategies for OpenRouter that optimize for cost, speed, quality, or balanced performance with intelligent fallback chains.

面向OpenRouter的可投入生产环境的模型路由配置与策略，通过智能降级链实现成本、速度、质量或平衡性能的优化。

Purpose

用途

This skill provides comprehensive templates, scripts, and strategies for implementing sophisticated model routing in OpenRouter-powered applications. It helps you:

Reduce API costs by routing to cheaper models when appropriate
Optimize for speed with fast models and streaming
Maintain quality with premium model fallbacks
Implement intelligent task-based routing
Build reliable multi-tier fallback chains

此技能为基于OpenRouter的应用提供全面的模板、脚本与策略，用于实现复杂的模型路由。它可帮助您：

在合适的情况下路由到更便宜的模型，降低API成本
使用快速模型与流式传输优化速度
通过高端模型降级机制维持质量
实现基于任务的智能路由
构建可靠的多层降级链

Activation Triggers

触发场景

Use this skill when:

Designing model routing strategies
Implementing cost optimization
Setting up fallback chains for reliability
Building task complexity-based routing
Configuring dynamic model selection
Optimizing API performance vs cost tradeoffs
Implementing A/B testing for models
Setting up monitoring and analytics

在以下场景使用此技能：

设计模型路由策略
实施成本优化
搭建用于保障可靠性的降级链
构建基于任务复杂度的路由机制
配置动态模型选择逻辑
优化API性能与成本的平衡
开展模型A/B测试
搭建监控与分析体系

Available Routing Strategies

可用路由策略

1. Cost-Optimized Routing

1. 成本优化路由

Goal: Minimize API costs while maintaining acceptable quality

Strategy:

Use free models (google/gemma-2-9b-it:free, meta-llama/llama-3.2-3b-instruct:free)
Fallback to budget models (anthropic/claude-4.5-sonnet, openai/gpt-4o-mini)
Premium models only for complex tasks requiring highest quality

Template:

templates/cost-optimized-routing.json

Best for:

High-volume applications
Simple tasks (classification, extraction, formatting)
Development/testing environments
Budget-constrained projects

目标： 在保证可接受质量的前提下最小化API成本

策略：

使用免费模型（google/gemma-2-9b-it:free, meta-llama/llama-3.2-3b-instruct:free）
降级到经济型模型（anthropic/claude-4.5-sonnet, openai/gpt-4o-mini）
仅在处理需要最高质量的复杂任务时使用高端模型

模板：

templates/cost-optimized-routing.json

适用场景：

高流量应用
简单任务（分类、提取、格式化）
开发/测试环境
预算有限的项目

2. Speed-Optimized Routing

2. 速度优化路由

Goal: Minimize latency and response time

Strategy:

Prioritize fastest models regardless of cost
Enable streaming for immediate feedback
Use smaller models with quick inference
Geographic routing to nearest endpoints

Template:

templates/speed-optimized-routing.json

Best for:

Real-time chat applications
Interactive user experiences
Low-latency requirements
Streaming responses

目标： 最小化延迟与响应时间

策略：

优先选择最快的模型，不考虑成本
启用流式传输以获取即时反馈
使用推理速度快的小型模型
基于地理位置路由到最近的端点

模板：

templates/speed-optimized-routing.json

适用场景：

实时聊天应用
交互式用户体验
低延迟需求场景
流式响应需求

3. Quality-Optimized Routing

3. 质量优化路由

Goal: Maximize output quality with premium models

Strategy:

Use top-tier models (gpt-4o, claude-4.5-sonnet, gemini-pro)
Fallback to other premium models for availability
Multi-model voting for critical tasks
Quality verification layers

Template:

templates/quality-optimized-routing.json

Best for:

Critical business decisions
Content creation
Complex reasoning tasks
Customer-facing applications

目标： 使用高端模型最大化输出质量

策略：

使用顶级模型（gpt-4o, claude-4.5-sonnet, gemini-pro）
当模型不可用时降级到其他高端模型
针对关键任务采用多模型投票机制
设置质量验证层

模板：

templates/quality-optimized-routing.json

适用场景：

关键业务决策
内容创作
复杂推理任务
面向客户的应用

4. Balanced Routing

4. 平衡型路由

Goal: Dynamically route based on task complexity

Strategy:

Analyze request complexity
Route simple tasks to cheap models
Route complex tasks to premium models
Adaptive based on success metrics

Template:

templates/balanced-routing.json

Best for:

Mixed workloads
Production applications
General-purpose AI services
Optimizing cost/quality tradeoff

目标： 根据任务复杂度动态路由

策略：

分析请求复杂度
将简单任务路由到廉价模型
将复杂任务路由到高端模型
根据成功指标自适应调整

模板：

templates/balanced-routing.json

适用场景：

混合工作负载
生产环境应用
通用AI服务
优化成本与质量的平衡

5. Custom Routing

5. 自定义路由

Goal: Implement domain-specific routing logic

Template:

templates/custom-routing-template.json

Customizable factors:

User tier/subscription level
Geographic location
Time of day pricing
Model availability
Rate limit status
Historical success rates

目标： 实现特定领域的路由逻辑

模板：

templates/custom-routing-template.json

可自定义因素：

用户层级/订阅级别
地理位置
时段定价
模型可用性
速率限制状态
历史成功率

Key Resources

核心资源

Scripts

脚本

validate-routing-config.sh

Validates routing configuration syntax
Checks model availability on OpenRouter
Verifies fallback chain logic
Ensures no circular dependencies
Validates model IDs and parameters

test-fallback-chain.sh

Tests fallback chain execution
Simulates model failures
Verifies graceful degradation
Measures latency through chain
Validates error handling

generate-routing-config.sh

Generates routing config from strategy type
Interactive configuration builder
Validates and optimizes settings
Exports to JSON/TypeScript/Python formats

analyze-cost-savings.sh

Analyzes potential cost savings from routing
Compares routing strategies
Projects monthly costs
Generates cost reports
Identifies optimization opportunities

validate-routing-config.sh

验证路由配置语法
检查模型在OpenRouter上的可用性
验证降级链逻辑
确保无循环依赖
验证模型ID与参数

test-fallback-chain.sh

测试降级链执行流程
模拟模型故障
验证优雅降级机制
测量链中的延迟
验证错误处理逻辑

generate-routing-config.sh

根据策略类型生成路由配置
交互式配置构建工具
验证并优化设置
导出为JSON/TypeScript/Python格式

analyze-cost-savings.sh

分析路由策略带来的潜在成本节约
对比不同路由策略
预测月度成本
生成成本报告
识别优化机会

Templates

模板

Configuration Templates (JSON):

```
cost-optimized-routing.json
```
- Free/cheap models with premium fallback
```
speed-optimized-routing.json
```
- Fastest models with streaming
```
quality-optimized-routing.json
```
- Premium models with fallbacks
```
balanced-routing.json
```
- Task-based dynamic routing
```
custom-routing-template.json
```
- Template for custom strategies

Code Templates:

```
routing-config.ts
```
- TypeScript routing configuration
```
routing-config.py
```
- Python routing configuration

配置模板（JSON）：

```
cost-optimized-routing.json
```
- 免费/廉价模型+高端降级
```
speed-optimized-routing.json
```
- 最快模型+流式传输
```
quality-optimized-routing.json
```
- 高端模型+降级机制
```
balanced-routing.json
```
- 基于任务的动态路由
```
custom-routing-template.json
```
- 自定义策略模板

代码模板：

```
routing-config.ts
```
- TypeScript路由配置
```
routing-config.py
```
- Python路由配置

Examples

示例

```
cost-routing-example.md
```
- Complete cost-optimized routing setup
```
dynamic-routing-example.md
```
- Task complexity-based routing
```
fallback-chain-example.md
```
- 3-tier fallback strategy
```
monitoring-example.md
```
- Cost tracking and analytics setup

```
cost-routing-example.md
```
- 完整的成本优化路由搭建示例
```
dynamic-routing-example.md
```
- 基于任务复杂度的路由示例
```
fallback-chain-example.md
```
- 三层降级策略示例
```
monitoring-example.md
```
- 成本追踪与分析搭建示例

Workflow

工作流程

1. Identify Requirements

1. 明确需求

Determine your optimization goals:

bash

undefined

确定你的优化目标：

bash

undefined

Interactive strategy selector

交互式策略选择工具

./scripts/generate-routing-config.sh


Answer questions about:
- Primary goal (cost/speed/quality/balanced)
- Budget constraints
- Latency requirements
- Quality thresholds
- Supported model providers

./scripts/generate-routing-config.sh


回答以下相关问题：
- 核心目标（成本/速度/质量/平衡）
- 预算限制
- 延迟要求
- 质量阈值
- 支持的模型提供商

2. Generate Configuration

2. 生成配置

bash

undefined

bash

undefined

Generate from strategy type

根据策略类型生成配置

./scripts/generate-routing-config.sh cost-optimized > config.json

Or copy template

或复制模板

cp templates/cost-optimized-routing.json config.json

undefined

cp templates/cost-optimized-routing.json config.json

undefined

3. Validate Configuration

3. 验证配置

bash

undefined

bash

undefined

Validate syntax and model availability

验证语法与模型可用性

./scripts/validate-routing-config.sh config.json


Checks:
- JSON syntax
- Model IDs exist on OpenRouter
- Fallback chain is valid
- No circular references
- Required fields present

./scripts/validate-routing-config.sh config.json


检查项：
- JSON语法
- 模型ID在OpenRouter上是否存在
- 降级链是否有效
- 无循环引用
- 必填字段是否齐全

4. Test Fallback Chain

4. 测试降级链

bash

undefined

bash

undefined

Test fallback behavior

测试降级行为

./scripts/test-fallback-chain.sh config.json


Simulates failures to ensure graceful degradation.

./scripts/test-fallback-chain.sh config.json


模拟故障场景，确保优雅降级机制正常工作。

5. Analyze Cost Impact

5. 分析成本影响

bash

undefined

bash

undefined

Compare routing strategies

对比路由策略

./scripts/analyze-cost-savings.sh config.json baseline-config.json


Shows projected savings and performance tradeoffs.

./scripts/analyze-cost-savings.sh config.json baseline-config.json


展示预测的成本节约与性能权衡。

6. Deploy and Monitor

6. 部署与监控

Deploy configuration to production
Monitor using examples/monitoring-example.md
Track metrics: cost, latency, success rate, quality
Iterate based on real-world performance

将配置部署到生产环境
参考examples/monitoring-example.md进行监控
追踪指标：成本、延迟、成功率、质量
根据实际性能迭代优化

Common Routing Patterns

常见路由模式

Pattern 1: Simple Fallback Chain

模式1：简单降级链

json

{
  "primary": "meta-llama/llama-3.2-3b-instruct:free",
  "fallback": [
    "anthropic/claude-4.5-sonnet",
    "openai/gpt-4o-mini"
  ]
}

json

{
  "primary": "meta-llama/llama-3.2-3b-instruct:free",
  "fallback": [
    "anthropic/claude-4.5-sonnet",
    "openai/gpt-4o-mini"
  ]
}

Pattern 2: Task Complexity Routing

模式2：任务复杂度路由

json

{
  "simple_tasks": {
    "models": ["google/gemma-2-9b-it:free"]
  },
  "medium_tasks": {
    "models": ["anthropic/claude-4.5-sonnet"]
  },
  "complex_tasks": {
    "models": ["openai/gpt-4o"]
  }
}

json

{
  "simple_tasks": {
    "models": ["google/gemma-2-9b-it:free"]
  },
  "medium_tasks": {
    "models": ["anthropic/claude-4.5-sonnet"]
  },
  "complex_tasks": {
    "models": ["openai/gpt-4o"]
  }
}

Pattern 3: Time-Based Routing

模式3：基于时段的路由

json

{
  "peak_hours": {
    "models": ["openai/gpt-4o-mini"],
    "max_latency_ms": 1000
  },
  "off_peak": {
    "models": ["google/gemini-pro"],
    "max_latency_ms": 3000
  }
}

json

{
  "peak_hours": {
    "models": ["openai/gpt-4o-mini"],
    "max_latency_ms": 1000
  },
  "off_peak": {
    "models": ["google/gemini-pro"],
    "max_latency_ms": 3000
  }
}

Pattern 4: User Tier Routing

模式4：用户层级路由

json

{
  "free_tier": {
    "models": ["meta-llama/llama-3.2-3b-instruct:free"],
    "rate_limit": 10
  },
  "premium_tier": {
    "models": ["anthropic/claude-4.5-sonnet"],
    "rate_limit": 1000
  }
}

json

{
  "free_tier": {
    "models": ["meta-llama/llama-3.2-3b-instruct:free"],
    "rate_limit": 10
  },
  "premium_tier": {
    "models": ["anthropic/claude-4.5-sonnet"],
    "rate_limit": 1000
  }
}

Model Categories for Routing

路由用模型分类

Free Models (Cost: $0)

免费模型（成本：$0）

```
google/gemma-2-9b-it:free
```
```
meta-llama/llama-3.2-3b-instruct:free
```
```
meta-llama/llama-3.2-1b-instruct:free
```
```
microsoft/phi-3-mini-128k-instruct:free
```

Use for: High-volume, simple tasks, development

```
google/gemma-2-9b-it:free
```
```
meta-llama/llama-3.2-3b-instruct:free
```
```
meta-llama/llama-3.2-1b-instruct:free
```
```
microsoft/phi-3-mini-128k-instruct:free
```

适用场景： 高流量、简单任务、开发环境

Budget Models (Cost: $0.10-0.50/1M tokens)

经济型模型（成本：$0.10-0.50/百万 tokens）

```
openai/gpt-4o-mini
```
```
google/gemini-flash-1.5
```

Use for: Production workloads, balanced cost/quality

```
openai/gpt-4o-mini
```
```
google/gemini-flash-1.5
```

适用场景： 生产工作负载、成本与质量平衡

Premium Models (Cost: $3-15/1M tokens)

高端模型（成本：$3-15/百万 tokens）

```
anthropic/claude-4.5-sonnet
```
```
openai/gpt-4o
```
```
google/gemini-pro-1.5
```

Use for: Complex reasoning, critical tasks, high quality

```
anthropic/claude-4.5-sonnet
```
```
openai/gpt-4o
```
```
google/gemini-pro-1.5
```

适用场景： 复杂推理、关键任务、高质量需求

Specialized Models

专用模型

Vision:
```
openai/gpt-4-vision-preview
```
Code:
```
anthropic/claude-4.5-sonnet
```
(code-specific)
Long Context:
```
google/gemini-pro-1.5
```
(1M+ tokens)

视觉：
```
openai/gpt-4-vision-preview
```
代码：
```
anthropic/claude-4.5-sonnet
```
（代码专用）
长上下文：
```
google/gemini-pro-1.5
```
（支持100万+ tokens）

Best Practices

最佳实践

Always implement fallback chains - Single points of failure cause downtime
Monitor actual costs - Theoretical savings may differ from real usage
Test quality degradation - Ensure cheaper models meet quality thresholds
Set timeout limits - Prevent slow models from blocking requests
Track model availability - Some models have rate limits or downtime
Use task classification - Route based on complexity, not one-size-fits-all
Implement retry logic - Handle transient failures gracefully
Cache responses - Reduce API calls for repeated queries
A/B test routing strategies - Validate improvements with data
Budget alerting - Get notified before exceeding cost limits

始终实现降级链 - 单点故障会导致服务中断
监控实际成本 - 理论节约可能与实际使用情况存在差异
测试质量下降情况 - 确保廉价模型满足质量阈值
设置超时限制 - 防止慢速模型阻塞请求
追踪模型可用性 - 部分模型存在速率限制或停机情况
使用任务分类 - 根据复杂度路由，而非一刀切
实现重试逻辑 - 优雅处理临时故障
缓存响应 - 减少重复查询的API调用
A/B测试路由策略 - 用数据验证优化效果
预算告警 - 在超出成本限制前收到通知

Troubleshooting

故障排查

All models in fallback chain failing:

Check OpenRouter status page
Verify API key has credits
Test with a known-working model
Review rate limit status

Higher costs than expected:

Analyze actual request distribution
Check if complex tasks are routed to premium models
Verify caching is working
Review retry/fallback patterns

Quality degradation:

Review which model is actually being used
Test free models against quality benchmarks
Consider upgrading routing strategy
Implement quality verification layer

High latency:

Check if using slow models
Enable streaming for faster perceived response
Use geographic routing
Implement parallel model requests with first-response-wins

降级链中所有模型都失败：

查看OpenRouter状态页面
验证API密钥是否有可用额度
使用已知可用的模型进行测试
检查速率限制状态

成本超出预期：

分析实际请求分布
检查复杂任务是否被路由到高端模型
验证缓存是否正常工作
检查重试/降级模式

质量下降：

查看实际使用的模型
对比免费模型与质量基准
考虑升级路由策略
实现质量验证层

高延迟：

检查是否使用了慢速模型
启用流式传输以提升感知响应速度
使用地理位置路由
实现并行模型请求，采用先响应优先机制

Integration Examples

集成示例

See examples directory for complete implementations:

Cost-optimized chat application
Dynamic routing based on conversation context
Multi-tier fallback with monitoring
Real-time cost tracking dashboard

Skill Location:

plugins/openrouter/skills/model-routing-patterns/

Version: 1.0.0 Supported Frameworks: Node.js, Python, TypeScript, any OpenRouter-compatible client

查看examples目录获取完整实现：

成本优化的聊天应用
基于对话上下文的动态路由
带监控的多层降级机制
实时成本追踪仪表盘

技能位置：

plugins/openrouter/skills/model-routing-patterns/

版本： 1.0.0 支持框架： Node.js, Python, TypeScript, 所有兼容OpenRouter的客户端