model-routing-patterns
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseModel Routing Patterns
模型路由模式
Production-ready model routing configurations and strategies for OpenRouter that optimize for cost, speed, quality, or balanced performance with intelligent fallback chains.
面向OpenRouter的可投入生产环境的模型路由配置与策略,通过智能降级链实现成本、速度、质量或平衡性能的优化。
Purpose
用途
This skill provides comprehensive templates, scripts, and strategies for implementing sophisticated model routing in OpenRouter-powered applications. It helps you:
- Reduce API costs by routing to cheaper models when appropriate
- Optimize for speed with fast models and streaming
- Maintain quality with premium model fallbacks
- Implement intelligent task-based routing
- Build reliable multi-tier fallback chains
此技能为基于OpenRouter的应用提供全面的模板、脚本与策略,用于实现复杂的模型路由。它可帮助您:
- 在合适的情况下路由到更便宜的模型,降低API成本
- 使用快速模型与流式传输优化速度
- 通过高端模型降级机制维持质量
- 实现基于任务的智能路由
- 构建可靠的多层降级链
Activation Triggers
触发场景
Use this skill when:
- Designing model routing strategies
- Implementing cost optimization
- Setting up fallback chains for reliability
- Building task complexity-based routing
- Configuring dynamic model selection
- Optimizing API performance vs cost tradeoffs
- Implementing A/B testing for models
- Setting up monitoring and analytics
在以下场景使用此技能:
- 设计模型路由策略
- 实施成本优化
- 搭建用于保障可靠性的降级链
- 构建基于任务复杂度的路由机制
- 配置动态模型选择逻辑
- 优化API性能与成本的平衡
- 开展模型A/B测试
- 搭建监控与分析体系
Available Routing Strategies
可用路由策略
1. Cost-Optimized Routing
1. 成本优化路由
Goal: Minimize API costs while maintaining acceptable quality
Strategy:
- Use free models (google/gemma-2-9b-it:free, meta-llama/llama-3.2-3b-instruct:free)
- Fallback to budget models (anthropic/claude-4.5-sonnet, openai/gpt-4o-mini)
- Premium models only for complex tasks requiring highest quality
Template:
templates/cost-optimized-routing.jsonBest for:
- High-volume applications
- Simple tasks (classification, extraction, formatting)
- Development/testing environments
- Budget-constrained projects
目标: 在保证可接受质量的前提下最小化API成本
策略:
- 使用免费模型(google/gemma-2-9b-it:free, meta-llama/llama-3.2-3b-instruct:free)
- 降级到经济型模型(anthropic/claude-4.5-sonnet, openai/gpt-4o-mini)
- 仅在处理需要最高质量的复杂任务时使用高端模型
模板:
templates/cost-optimized-routing.json适用场景:
- 高流量应用
- 简单任务(分类、提取、格式化)
- 开发/测试环境
- 预算有限的项目
2. Speed-Optimized Routing
2. 速度优化路由
Goal: Minimize latency and response time
Strategy:
- Prioritize fastest models regardless of cost
- Enable streaming for immediate feedback
- Use smaller models with quick inference
- Geographic routing to nearest endpoints
Template:
templates/speed-optimized-routing.jsonBest for:
- Real-time chat applications
- Interactive user experiences
- Low-latency requirements
- Streaming responses
目标: 最小化延迟与响应时间
策略:
- 优先选择最快的模型,不考虑成本
- 启用流式传输以获取即时反馈
- 使用推理速度快的小型模型
- 基于地理位置路由到最近的端点
模板:
templates/speed-optimized-routing.json适用场景:
- 实时聊天应用
- 交互式用户体验
- 低延迟需求场景
- 流式响应需求
3. Quality-Optimized Routing
3. 质量优化路由
Goal: Maximize output quality with premium models
Strategy:
- Use top-tier models (gpt-4o, claude-4.5-sonnet, gemini-pro)
- Fallback to other premium models for availability
- Multi-model voting for critical tasks
- Quality verification layers
Template:
templates/quality-optimized-routing.jsonBest for:
- Critical business decisions
- Content creation
- Complex reasoning tasks
- Customer-facing applications
目标: 使用高端模型最大化输出质量
策略:
- 使用顶级模型(gpt-4o, claude-4.5-sonnet, gemini-pro)
- 当模型不可用时降级到其他高端模型
- 针对关键任务采用多模型投票机制
- 设置质量验证层
模板:
templates/quality-optimized-routing.json适用场景:
- 关键业务决策
- 内容创作
- 复杂推理任务
- 面向客户的应用
4. Balanced Routing
4. 平衡型路由
Goal: Dynamically route based on task complexity
Strategy:
- Analyze request complexity
- Route simple tasks to cheap models
- Route complex tasks to premium models
- Adaptive based on success metrics
Template:
templates/balanced-routing.jsonBest for:
- Mixed workloads
- Production applications
- General-purpose AI services
- Optimizing cost/quality tradeoff
目标: 根据任务复杂度动态路由
策略:
- 分析请求复杂度
- 将简单任务路由到廉价模型
- 将复杂任务路由到高端模型
- 根据成功指标自适应调整
模板:
templates/balanced-routing.json适用场景:
- 混合工作负载
- 生产环境应用
- 通用AI服务
- 优化成本与质量的平衡
5. Custom Routing
5. 自定义路由
Goal: Implement domain-specific routing logic
Template:
templates/custom-routing-template.jsonCustomizable factors:
- User tier/subscription level
- Geographic location
- Time of day pricing
- Model availability
- Rate limit status
- Historical success rates
目标: 实现特定领域的路由逻辑
模板:
templates/custom-routing-template.json可自定义因素:
- 用户层级/订阅级别
- 地理位置
- 时段定价
- 模型可用性
- 速率限制状态
- 历史成功率
Key Resources
核心资源
Scripts
脚本
validate-routing-config.sh
- Validates routing configuration syntax
- Checks model availability on OpenRouter
- Verifies fallback chain logic
- Ensures no circular dependencies
- Validates model IDs and parameters
test-fallback-chain.sh
- Tests fallback chain execution
- Simulates model failures
- Verifies graceful degradation
- Measures latency through chain
- Validates error handling
generate-routing-config.sh
- Generates routing config from strategy type
- Interactive configuration builder
- Validates and optimizes settings
- Exports to JSON/TypeScript/Python formats
analyze-cost-savings.sh
- Analyzes potential cost savings from routing
- Compares routing strategies
- Projects monthly costs
- Generates cost reports
- Identifies optimization opportunities
validate-routing-config.sh
- 验证路由配置语法
- 检查模型在OpenRouter上的可用性
- 验证降级链逻辑
- 确保无循环依赖
- 验证模型ID与参数
test-fallback-chain.sh
- 测试降级链执行流程
- 模拟模型故障
- 验证优雅降级机制
- 测量链中的延迟
- 验证错误处理逻辑
generate-routing-config.sh
- 根据策略类型生成路由配置
- 交互式配置构建工具
- 验证并优化设置
- 导出为JSON/TypeScript/Python格式
analyze-cost-savings.sh
- 分析路由策略带来的潜在成本节约
- 对比不同路由策略
- 预测月度成本
- 生成成本报告
- 识别优化机会
Templates
模板
Configuration Templates (JSON):
- - Free/cheap models with premium fallback
cost-optimized-routing.json - - Fastest models with streaming
speed-optimized-routing.json - - Premium models with fallbacks
quality-optimized-routing.json - - Task-based dynamic routing
balanced-routing.json - - Template for custom strategies
custom-routing-template.json
Code Templates:
- - TypeScript routing configuration
routing-config.ts - - Python routing configuration
routing-config.py
配置模板(JSON):
- - 免费/廉价模型+高端降级
cost-optimized-routing.json - - 最快模型+流式传输
speed-optimized-routing.json - - 高端模型+降级机制
quality-optimized-routing.json - - 基于任务的动态路由
balanced-routing.json - - 自定义策略模板
custom-routing-template.json
代码模板:
- - TypeScript路由配置
routing-config.ts - - Python路由配置
routing-config.py
Examples
示例
- - Complete cost-optimized routing setup
cost-routing-example.md - - Task complexity-based routing
dynamic-routing-example.md - - 3-tier fallback strategy
fallback-chain-example.md - - Cost tracking and analytics setup
monitoring-example.md
- - 完整的成本优化路由搭建示例
cost-routing-example.md - - 基于任务复杂度的路由示例
dynamic-routing-example.md - - 三层降级策略示例
fallback-chain-example.md - - 成本追踪与分析搭建示例
monitoring-example.md
Workflow
工作流程
1. Identify Requirements
1. 明确需求
Determine your optimization goals:
bash
undefined确定你的优化目标:
bash
undefinedInteractive strategy selector
交互式策略选择工具
./scripts/generate-routing-config.sh
Answer questions about:
- Primary goal (cost/speed/quality/balanced)
- Budget constraints
- Latency requirements
- Quality thresholds
- Supported model providers./scripts/generate-routing-config.sh
回答以下相关问题:
- 核心目标(成本/速度/质量/平衡)
- 预算限制
- 延迟要求
- 质量阈值
- 支持的模型提供商2. Generate Configuration
2. 生成配置
bash
undefinedbash
undefinedGenerate from strategy type
根据策略类型生成配置
./scripts/generate-routing-config.sh cost-optimized > config.json
./scripts/generate-routing-config.sh cost-optimized > config.json
Or copy template
或复制模板
cp templates/cost-optimized-routing.json config.json
undefinedcp templates/cost-optimized-routing.json config.json
undefined3. Validate Configuration
3. 验证配置
bash
undefinedbash
undefinedValidate syntax and model availability
验证语法与模型可用性
./scripts/validate-routing-config.sh config.json
Checks:
- JSON syntax
- Model IDs exist on OpenRouter
- Fallback chain is valid
- No circular references
- Required fields present./scripts/validate-routing-config.sh config.json
检查项:
- JSON语法
- 模型ID在OpenRouter上是否存在
- 降级链是否有效
- 无循环引用
- 必填字段是否齐全4. Test Fallback Chain
4. 测试降级链
bash
undefinedbash
undefinedTest fallback behavior
测试降级行为
./scripts/test-fallback-chain.sh config.json
Simulates failures to ensure graceful degradation../scripts/test-fallback-chain.sh config.json
模拟故障场景,确保优雅降级机制正常工作。5. Analyze Cost Impact
5. 分析成本影响
bash
undefinedbash
undefinedCompare routing strategies
对比路由策略
./scripts/analyze-cost-savings.sh config.json baseline-config.json
Shows projected savings and performance tradeoffs../scripts/analyze-cost-savings.sh config.json baseline-config.json
展示预测的成本节约与性能权衡。6. Deploy and Monitor
6. 部署与监控
- Deploy configuration to production
- Monitor using examples/monitoring-example.md
- Track metrics: cost, latency, success rate, quality
- Iterate based on real-world performance
- 将配置部署到生产环境
- 参考examples/monitoring-example.md进行监控
- 追踪指标:成本、延迟、成功率、质量
- 根据实际性能迭代优化
Common Routing Patterns
常见路由模式
Pattern 1: Simple Fallback Chain
模式1:简单降级链
json
{
"primary": "meta-llama/llama-3.2-3b-instruct:free",
"fallback": [
"anthropic/claude-4.5-sonnet",
"openai/gpt-4o-mini"
]
}json
{
"primary": "meta-llama/llama-3.2-3b-instruct:free",
"fallback": [
"anthropic/claude-4.5-sonnet",
"openai/gpt-4o-mini"
]
}Pattern 2: Task Complexity Routing
模式2:任务复杂度路由
json
{
"simple_tasks": {
"models": ["google/gemma-2-9b-it:free"]
},
"medium_tasks": {
"models": ["anthropic/claude-4.5-sonnet"]
},
"complex_tasks": {
"models": ["openai/gpt-4o"]
}
}json
{
"simple_tasks": {
"models": ["google/gemma-2-9b-it:free"]
},
"medium_tasks": {
"models": ["anthropic/claude-4.5-sonnet"]
},
"complex_tasks": {
"models": ["openai/gpt-4o"]
}
}Pattern 3: Time-Based Routing
模式3:基于时段的路由
json
{
"peak_hours": {
"models": ["openai/gpt-4o-mini"],
"max_latency_ms": 1000
},
"off_peak": {
"models": ["google/gemini-pro"],
"max_latency_ms": 3000
}
}json
{
"peak_hours": {
"models": ["openai/gpt-4o-mini"],
"max_latency_ms": 1000
},
"off_peak": {
"models": ["google/gemini-pro"],
"max_latency_ms": 3000
}
}Pattern 4: User Tier Routing
模式4:用户层级路由
json
{
"free_tier": {
"models": ["meta-llama/llama-3.2-3b-instruct:free"],
"rate_limit": 10
},
"premium_tier": {
"models": ["anthropic/claude-4.5-sonnet"],
"rate_limit": 1000
}
}json
{
"free_tier": {
"models": ["meta-llama/llama-3.2-3b-instruct:free"],
"rate_limit": 10
},
"premium_tier": {
"models": ["anthropic/claude-4.5-sonnet"],
"rate_limit": 1000
}
}Model Categories for Routing
路由用模型分类
Free Models (Cost: $0)
免费模型(成本:$0)
google/gemma-2-9b-it:freemeta-llama/llama-3.2-3b-instruct:freemeta-llama/llama-3.2-1b-instruct:freemicrosoft/phi-3-mini-128k-instruct:free
Use for: High-volume, simple tasks, development
google/gemma-2-9b-it:freemeta-llama/llama-3.2-3b-instruct:freemeta-llama/llama-3.2-1b-instruct:freemicrosoft/phi-3-mini-128k-instruct:free
适用场景: 高流量、简单任务、开发环境
Budget Models (Cost: $0.10-0.50/1M tokens)
经济型模型(成本:$0.10-0.50/百万 tokens)
openai/gpt-4o-minigoogle/gemini-flash-1.5
Use for: Production workloads, balanced cost/quality
openai/gpt-4o-minigoogle/gemini-flash-1.5
适用场景: 生产工作负载、成本与质量平衡
Premium Models (Cost: $3-15/1M tokens)
高端模型(成本:$3-15/百万 tokens)
anthropic/claude-4.5-sonnetopenai/gpt-4ogoogle/gemini-pro-1.5
Use for: Complex reasoning, critical tasks, high quality
anthropic/claude-4.5-sonnetopenai/gpt-4ogoogle/gemini-pro-1.5
适用场景: 复杂推理、关键任务、高质量需求
Specialized Models
专用模型
- Vision:
openai/gpt-4-vision-preview - Code: (code-specific)
anthropic/claude-4.5-sonnet - Long Context: (1M+ tokens)
google/gemini-pro-1.5
- 视觉:
openai/gpt-4-vision-preview - 代码: (代码专用)
anthropic/claude-4.5-sonnet - 长上下文: (支持100万+ tokens)
google/gemini-pro-1.5
Best Practices
最佳实践
- Always implement fallback chains - Single points of failure cause downtime
- Monitor actual costs - Theoretical savings may differ from real usage
- Test quality degradation - Ensure cheaper models meet quality thresholds
- Set timeout limits - Prevent slow models from blocking requests
- Track model availability - Some models have rate limits or downtime
- Use task classification - Route based on complexity, not one-size-fits-all
- Implement retry logic - Handle transient failures gracefully
- Cache responses - Reduce API calls for repeated queries
- A/B test routing strategies - Validate improvements with data
- Budget alerting - Get notified before exceeding cost limits
- 始终实现降级链 - 单点故障会导致服务中断
- 监控实际成本 - 理论节约可能与实际使用情况存在差异
- 测试质量下降情况 - 确保廉价模型满足质量阈值
- 设置超时限制 - 防止慢速模型阻塞请求
- 追踪模型可用性 - 部分模型存在速率限制或停机情况
- 使用任务分类 - 根据复杂度路由,而非一刀切
- 实现重试逻辑 - 优雅处理临时故障
- 缓存响应 - 减少重复查询的API调用
- A/B测试路由策略 - 用数据验证优化效果
- 预算告警 - 在超出成本限制前收到通知
Troubleshooting
故障排查
All models in fallback chain failing:
- Check OpenRouter status page
- Verify API key has credits
- Test with a known-working model
- Review rate limit status
Higher costs than expected:
- Analyze actual request distribution
- Check if complex tasks are routed to premium models
- Verify caching is working
- Review retry/fallback patterns
Quality degradation:
- Review which model is actually being used
- Test free models against quality benchmarks
- Consider upgrading routing strategy
- Implement quality verification layer
High latency:
- Check if using slow models
- Enable streaming for faster perceived response
- Use geographic routing
- Implement parallel model requests with first-response-wins
降级链中所有模型都失败:
- 查看OpenRouter状态页面
- 验证API密钥是否有可用额度
- 使用已知可用的模型进行测试
- 检查速率限制状态
成本超出预期:
- 分析实际请求分布
- 检查复杂任务是否被路由到高端模型
- 验证缓存是否正常工作
- 检查重试/降级模式
质量下降:
- 查看实际使用的模型
- 对比免费模型与质量基准
- 考虑升级路由策略
- 实现质量验证层
高延迟:
- 检查是否使用了慢速模型
- 启用流式传输以提升感知响应速度
- 使用地理位置路由
- 实现并行模型请求,采用先响应优先机制
Integration Examples
集成示例
See examples directory for complete implementations:
- Cost-optimized chat application
- Dynamic routing based on conversation context
- Multi-tier fallback with monitoring
- Real-time cost tracking dashboard
Skill Location:
Version: 1.0.0
Supported Frameworks: Node.js, Python, TypeScript, any OpenRouter-compatible client
plugins/openrouter/skills/model-routing-patterns/查看examples目录获取完整实现:
- 成本优化的聊天应用
- 基于对话上下文的动态路由
- 带监控的多层降级机制
- 实时成本追踪仪表盘
技能位置:
版本: 1.0.0
支持框架: Node.js, Python, TypeScript, 所有兼容OpenRouter的客户端
plugins/openrouter/skills/model-routing-patterns/