model-routing-patterns

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Model Routing Patterns

模型路由模式

Production-ready model routing configurations and strategies for OpenRouter that optimize for cost, speed, quality, or balanced performance with intelligent fallback chains.
面向OpenRouter的可投入生产环境的模型路由配置与策略,通过智能降级链实现成本、速度、质量或平衡性能的优化。

Purpose

用途

This skill provides comprehensive templates, scripts, and strategies for implementing sophisticated model routing in OpenRouter-powered applications. It helps you:
  • Reduce API costs by routing to cheaper models when appropriate
  • Optimize for speed with fast models and streaming
  • Maintain quality with premium model fallbacks
  • Implement intelligent task-based routing
  • Build reliable multi-tier fallback chains
此技能为基于OpenRouter的应用提供全面的模板、脚本与策略,用于实现复杂的模型路由。它可帮助您:
  • 在合适的情况下路由到更便宜的模型,降低API成本
  • 使用快速模型与流式传输优化速度
  • 通过高端模型降级机制维持质量
  • 实现基于任务的智能路由
  • 构建可靠的多层降级链

Activation Triggers

触发场景

Use this skill when:
  • Designing model routing strategies
  • Implementing cost optimization
  • Setting up fallback chains for reliability
  • Building task complexity-based routing
  • Configuring dynamic model selection
  • Optimizing API performance vs cost tradeoffs
  • Implementing A/B testing for models
  • Setting up monitoring and analytics
在以下场景使用此技能:
  • 设计模型路由策略
  • 实施成本优化
  • 搭建用于保障可靠性的降级链
  • 构建基于任务复杂度的路由机制
  • 配置动态模型选择逻辑
  • 优化API性能与成本的平衡
  • 开展模型A/B测试
  • 搭建监控与分析体系

Available Routing Strategies

可用路由策略

1. Cost-Optimized Routing

1. 成本优化路由

Goal: Minimize API costs while maintaining acceptable quality
Strategy:
  • Use free models (google/gemma-2-9b-it:free, meta-llama/llama-3.2-3b-instruct:free)
  • Fallback to budget models (anthropic/claude-4.5-sonnet, openai/gpt-4o-mini)
  • Premium models only for complex tasks requiring highest quality
Template:
templates/cost-optimized-routing.json
Best for:
  • High-volume applications
  • Simple tasks (classification, extraction, formatting)
  • Development/testing environments
  • Budget-constrained projects
目标: 在保证可接受质量的前提下最小化API成本
策略:
  • 使用免费模型(google/gemma-2-9b-it:free, meta-llama/llama-3.2-3b-instruct:free)
  • 降级到经济型模型(anthropic/claude-4.5-sonnet, openai/gpt-4o-mini)
  • 仅在处理需要最高质量的复杂任务时使用高端模型
模板:
templates/cost-optimized-routing.json
适用场景:
  • 高流量应用
  • 简单任务(分类、提取、格式化)
  • 开发/测试环境
  • 预算有限的项目

2. Speed-Optimized Routing

2. 速度优化路由

Goal: Minimize latency and response time
Strategy:
  • Prioritize fastest models regardless of cost
  • Enable streaming for immediate feedback
  • Use smaller models with quick inference
  • Geographic routing to nearest endpoints
Template:
templates/speed-optimized-routing.json
Best for:
  • Real-time chat applications
  • Interactive user experiences
  • Low-latency requirements
  • Streaming responses
目标: 最小化延迟与响应时间
策略:
  • 优先选择最快的模型,不考虑成本
  • 启用流式传输以获取即时反馈
  • 使用推理速度快的小型模型
  • 基于地理位置路由到最近的端点
模板:
templates/speed-optimized-routing.json
适用场景:
  • 实时聊天应用
  • 交互式用户体验
  • 低延迟需求场景
  • 流式响应需求

3. Quality-Optimized Routing

3. 质量优化路由

Goal: Maximize output quality with premium models
Strategy:
  • Use top-tier models (gpt-4o, claude-4.5-sonnet, gemini-pro)
  • Fallback to other premium models for availability
  • Multi-model voting for critical tasks
  • Quality verification layers
Template:
templates/quality-optimized-routing.json
Best for:
  • Critical business decisions
  • Content creation
  • Complex reasoning tasks
  • Customer-facing applications
目标: 使用高端模型最大化输出质量
策略:
  • 使用顶级模型(gpt-4o, claude-4.5-sonnet, gemini-pro)
  • 当模型不可用时降级到其他高端模型
  • 针对关键任务采用多模型投票机制
  • 设置质量验证层
模板:
templates/quality-optimized-routing.json
适用场景:
  • 关键业务决策
  • 内容创作
  • 复杂推理任务
  • 面向客户的应用

4. Balanced Routing

4. 平衡型路由

Goal: Dynamically route based on task complexity
Strategy:
  • Analyze request complexity
  • Route simple tasks to cheap models
  • Route complex tasks to premium models
  • Adaptive based on success metrics
Template:
templates/balanced-routing.json
Best for:
  • Mixed workloads
  • Production applications
  • General-purpose AI services
  • Optimizing cost/quality tradeoff
目标: 根据任务复杂度动态路由
策略:
  • 分析请求复杂度
  • 将简单任务路由到廉价模型
  • 将复杂任务路由到高端模型
  • 根据成功指标自适应调整
模板:
templates/balanced-routing.json
适用场景:
  • 混合工作负载
  • 生产环境应用
  • 通用AI服务
  • 优化成本与质量的平衡

5. Custom Routing

5. 自定义路由

Goal: Implement domain-specific routing logic
Template:
templates/custom-routing-template.json
Customizable factors:
  • User tier/subscription level
  • Geographic location
  • Time of day pricing
  • Model availability
  • Rate limit status
  • Historical success rates
目标: 实现特定领域的路由逻辑
模板:
templates/custom-routing-template.json
可自定义因素:
  • 用户层级/订阅级别
  • 地理位置
  • 时段定价
  • 模型可用性
  • 速率限制状态
  • 历史成功率

Key Resources

核心资源

Scripts

脚本

validate-routing-config.sh
  • Validates routing configuration syntax
  • Checks model availability on OpenRouter
  • Verifies fallback chain logic
  • Ensures no circular dependencies
  • Validates model IDs and parameters
test-fallback-chain.sh
  • Tests fallback chain execution
  • Simulates model failures
  • Verifies graceful degradation
  • Measures latency through chain
  • Validates error handling
generate-routing-config.sh
  • Generates routing config from strategy type
  • Interactive configuration builder
  • Validates and optimizes settings
  • Exports to JSON/TypeScript/Python formats
analyze-cost-savings.sh
  • Analyzes potential cost savings from routing
  • Compares routing strategies
  • Projects monthly costs
  • Generates cost reports
  • Identifies optimization opportunities
validate-routing-config.sh
  • 验证路由配置语法
  • 检查模型在OpenRouter上的可用性
  • 验证降级链逻辑
  • 确保无循环依赖
  • 验证模型ID与参数
test-fallback-chain.sh
  • 测试降级链执行流程
  • 模拟模型故障
  • 验证优雅降级机制
  • 测量链中的延迟
  • 验证错误处理逻辑
generate-routing-config.sh
  • 根据策略类型生成路由配置
  • 交互式配置构建工具
  • 验证并优化设置
  • 导出为JSON/TypeScript/Python格式
analyze-cost-savings.sh
  • 分析路由策略带来的潜在成本节约
  • 对比不同路由策略
  • 预测月度成本
  • 生成成本报告
  • 识别优化机会

Templates

模板

Configuration Templates (JSON):
  • cost-optimized-routing.json
    - Free/cheap models with premium fallback
  • speed-optimized-routing.json
    - Fastest models with streaming
  • quality-optimized-routing.json
    - Premium models with fallbacks
  • balanced-routing.json
    - Task-based dynamic routing
  • custom-routing-template.json
    - Template for custom strategies
Code Templates:
  • routing-config.ts
    - TypeScript routing configuration
  • routing-config.py
    - Python routing configuration
配置模板(JSON):
  • cost-optimized-routing.json
    - 免费/廉价模型+高端降级
  • speed-optimized-routing.json
    - 最快模型+流式传输
  • quality-optimized-routing.json
    - 高端模型+降级机制
  • balanced-routing.json
    - 基于任务的动态路由
  • custom-routing-template.json
    - 自定义策略模板
代码模板:
  • routing-config.ts
    - TypeScript路由配置
  • routing-config.py
    - Python路由配置

Examples

示例

  • cost-routing-example.md
    - Complete cost-optimized routing setup
  • dynamic-routing-example.md
    - Task complexity-based routing
  • fallback-chain-example.md
    - 3-tier fallback strategy
  • monitoring-example.md
    - Cost tracking and analytics setup
  • cost-routing-example.md
    - 完整的成本优化路由搭建示例
  • dynamic-routing-example.md
    - 基于任务复杂度的路由示例
  • fallback-chain-example.md
    - 三层降级策略示例
  • monitoring-example.md
    - 成本追踪与分析搭建示例

Workflow

工作流程

1. Identify Requirements

1. 明确需求

Determine your optimization goals:
bash
undefined
确定你的优化目标:
bash
undefined

Interactive strategy selector

交互式策略选择工具

./scripts/generate-routing-config.sh

Answer questions about:
- Primary goal (cost/speed/quality/balanced)
- Budget constraints
- Latency requirements
- Quality thresholds
- Supported model providers
./scripts/generate-routing-config.sh

回答以下相关问题:
- 核心目标(成本/速度/质量/平衡)
- 预算限制
- 延迟要求
- 质量阈值
- 支持的模型提供商

2. Generate Configuration

2. 生成配置

bash
undefined
bash
undefined

Generate from strategy type

根据策略类型生成配置

./scripts/generate-routing-config.sh cost-optimized > config.json
./scripts/generate-routing-config.sh cost-optimized > config.json

Or copy template

或复制模板

cp templates/cost-optimized-routing.json config.json
undefined
cp templates/cost-optimized-routing.json config.json
undefined

3. Validate Configuration

3. 验证配置

bash
undefined
bash
undefined

Validate syntax and model availability

验证语法与模型可用性

./scripts/validate-routing-config.sh config.json

Checks:
- JSON syntax
- Model IDs exist on OpenRouter
- Fallback chain is valid
- No circular references
- Required fields present
./scripts/validate-routing-config.sh config.json

检查项:
- JSON语法
- 模型ID在OpenRouter上是否存在
- 降级链是否有效
- 无循环引用
- 必填字段是否齐全

4. Test Fallback Chain

4. 测试降级链

bash
undefined
bash
undefined

Test fallback behavior

测试降级行为

./scripts/test-fallback-chain.sh config.json

Simulates failures to ensure graceful degradation.
./scripts/test-fallback-chain.sh config.json

模拟故障场景,确保优雅降级机制正常工作。

5. Analyze Cost Impact

5. 分析成本影响

bash
undefined
bash
undefined

Compare routing strategies

对比路由策略

./scripts/analyze-cost-savings.sh config.json baseline-config.json

Shows projected savings and performance tradeoffs.
./scripts/analyze-cost-savings.sh config.json baseline-config.json

展示预测的成本节约与性能权衡。

6. Deploy and Monitor

6. 部署与监控

  • Deploy configuration to production
  • Monitor using examples/monitoring-example.md
  • Track metrics: cost, latency, success rate, quality
  • Iterate based on real-world performance
  • 将配置部署到生产环境
  • 参考examples/monitoring-example.md进行监控
  • 追踪指标:成本、延迟、成功率、质量
  • 根据实际性能迭代优化

Common Routing Patterns

常见路由模式

Pattern 1: Simple Fallback Chain

模式1:简单降级链

json
{
  "primary": "meta-llama/llama-3.2-3b-instruct:free",
  "fallback": [
    "anthropic/claude-4.5-sonnet",
    "openai/gpt-4o-mini"
  ]
}
json
{
  "primary": "meta-llama/llama-3.2-3b-instruct:free",
  "fallback": [
    "anthropic/claude-4.5-sonnet",
    "openai/gpt-4o-mini"
  ]
}

Pattern 2: Task Complexity Routing

模式2:任务复杂度路由

json
{
  "simple_tasks": {
    "models": ["google/gemma-2-9b-it:free"]
  },
  "medium_tasks": {
    "models": ["anthropic/claude-4.5-sonnet"]
  },
  "complex_tasks": {
    "models": ["openai/gpt-4o"]
  }
}
json
{
  "simple_tasks": {
    "models": ["google/gemma-2-9b-it:free"]
  },
  "medium_tasks": {
    "models": ["anthropic/claude-4.5-sonnet"]
  },
  "complex_tasks": {
    "models": ["openai/gpt-4o"]
  }
}

Pattern 3: Time-Based Routing

模式3:基于时段的路由

json
{
  "peak_hours": {
    "models": ["openai/gpt-4o-mini"],
    "max_latency_ms": 1000
  },
  "off_peak": {
    "models": ["google/gemini-pro"],
    "max_latency_ms": 3000
  }
}
json
{
  "peak_hours": {
    "models": ["openai/gpt-4o-mini"],
    "max_latency_ms": 1000
  },
  "off_peak": {
    "models": ["google/gemini-pro"],
    "max_latency_ms": 3000
  }
}

Pattern 4: User Tier Routing

模式4:用户层级路由

json
{
  "free_tier": {
    "models": ["meta-llama/llama-3.2-3b-instruct:free"],
    "rate_limit": 10
  },
  "premium_tier": {
    "models": ["anthropic/claude-4.5-sonnet"],
    "rate_limit": 1000
  }
}
json
{
  "free_tier": {
    "models": ["meta-llama/llama-3.2-3b-instruct:free"],
    "rate_limit": 10
  },
  "premium_tier": {
    "models": ["anthropic/claude-4.5-sonnet"],
    "rate_limit": 1000
  }
}

Model Categories for Routing

路由用模型分类

Free Models (Cost: $0)

免费模型(成本:$0)

  • google/gemma-2-9b-it:free
  • meta-llama/llama-3.2-3b-instruct:free
  • meta-llama/llama-3.2-1b-instruct:free
  • microsoft/phi-3-mini-128k-instruct:free
Use for: High-volume, simple tasks, development
  • google/gemma-2-9b-it:free
  • meta-llama/llama-3.2-3b-instruct:free
  • meta-llama/llama-3.2-1b-instruct:free
  • microsoft/phi-3-mini-128k-instruct:free
适用场景: 高流量、简单任务、开发环境

Budget Models (Cost: $0.10-0.50/1M tokens)

经济型模型(成本:$0.10-0.50/百万 tokens)

  • openai/gpt-4o-mini
  • google/gemini-flash-1.5
Use for: Production workloads, balanced cost/quality
  • openai/gpt-4o-mini
  • google/gemini-flash-1.5
适用场景: 生产工作负载、成本与质量平衡

Premium Models (Cost: $3-15/1M tokens)

高端模型(成本:$3-15/百万 tokens)

  • anthropic/claude-4.5-sonnet
  • openai/gpt-4o
  • google/gemini-pro-1.5
Use for: Complex reasoning, critical tasks, high quality
  • anthropic/claude-4.5-sonnet
  • openai/gpt-4o
  • google/gemini-pro-1.5
适用场景: 复杂推理、关键任务、高质量需求

Specialized Models

专用模型

  • Vision:
    openai/gpt-4-vision-preview
  • Code:
    anthropic/claude-4.5-sonnet
    (code-specific)
  • Long Context:
    google/gemini-pro-1.5
    (1M+ tokens)
  • 视觉:
    openai/gpt-4-vision-preview
  • 代码:
    anthropic/claude-4.5-sonnet
    (代码专用)
  • 长上下文:
    google/gemini-pro-1.5
    (支持100万+ tokens)

Best Practices

最佳实践

  1. Always implement fallback chains - Single points of failure cause downtime
  2. Monitor actual costs - Theoretical savings may differ from real usage
  3. Test quality degradation - Ensure cheaper models meet quality thresholds
  4. Set timeout limits - Prevent slow models from blocking requests
  5. Track model availability - Some models have rate limits or downtime
  6. Use task classification - Route based on complexity, not one-size-fits-all
  7. Implement retry logic - Handle transient failures gracefully
  8. Cache responses - Reduce API calls for repeated queries
  9. A/B test routing strategies - Validate improvements with data
  10. Budget alerting - Get notified before exceeding cost limits
  1. 始终实现降级链 - 单点故障会导致服务中断
  2. 监控实际成本 - 理论节约可能与实际使用情况存在差异
  3. 测试质量下降情况 - 确保廉价模型满足质量阈值
  4. 设置超时限制 - 防止慢速模型阻塞请求
  5. 追踪模型可用性 - 部分模型存在速率限制或停机情况
  6. 使用任务分类 - 根据复杂度路由,而非一刀切
  7. 实现重试逻辑 - 优雅处理临时故障
  8. 缓存响应 - 减少重复查询的API调用
  9. A/B测试路由策略 - 用数据验证优化效果
  10. 预算告警 - 在超出成本限制前收到通知

Troubleshooting

故障排查

All models in fallback chain failing:
  • Check OpenRouter status page
  • Verify API key has credits
  • Test with a known-working model
  • Review rate limit status
Higher costs than expected:
  • Analyze actual request distribution
  • Check if complex tasks are routed to premium models
  • Verify caching is working
  • Review retry/fallback patterns
Quality degradation:
  • Review which model is actually being used
  • Test free models against quality benchmarks
  • Consider upgrading routing strategy
  • Implement quality verification layer
High latency:
  • Check if using slow models
  • Enable streaming for faster perceived response
  • Use geographic routing
  • Implement parallel model requests with first-response-wins
降级链中所有模型都失败:
  • 查看OpenRouter状态页面
  • 验证API密钥是否有可用额度
  • 使用已知可用的模型进行测试
  • 检查速率限制状态
成本超出预期:
  • 分析实际请求分布
  • 检查复杂任务是否被路由到高端模型
  • 验证缓存是否正常工作
  • 检查重试/降级模式
质量下降:
  • 查看实际使用的模型
  • 对比免费模型与质量基准
  • 考虑升级路由策略
  • 实现质量验证层
高延迟:
  • 检查是否使用了慢速模型
  • 启用流式传输以提升感知响应速度
  • 使用地理位置路由
  • 实现并行模型请求,采用先响应优先机制

Integration Examples

集成示例

See examples directory for complete implementations:
  • Cost-optimized chat application
  • Dynamic routing based on conversation context
  • Multi-tier fallback with monitoring
  • Real-time cost tracking dashboard

Skill Location:
plugins/openrouter/skills/model-routing-patterns/
Version: 1.0.0 Supported Frameworks: Node.js, Python, TypeScript, any OpenRouter-compatible client
查看examples目录获取完整实现:
  • 成本优化的聊天应用
  • 基于对话上下文的动态路由
  • 带监控的多层降级机制
  • 实时成本追踪仪表盘

技能位置:
plugins/openrouter/skills/model-routing-patterns/
版本: 1.0.0 支持框架: Node.js, Python, TypeScript, 所有兼容OpenRouter的客户端