openrouter
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOpenRouter Skill
OpenRouter 技能
Comprehensive assistance with OpenRouter API development, providing unified access to hundreds of AI models through a single endpoint with intelligent routing, automatic fallbacks, and standardized interfaces.
为OpenRouter API开发提供全面支持,通过单一端点提供对数百个AI模型的统一访问,具备智能路由、自动降级和标准化接口功能。
When to Use This Skill
何时使用该技能
This skill should be triggered when:
- Making API calls to multiple AI model providers through a unified interface
- Implementing model fallback strategies or auto-routing
- Working with OpenAI-compatible SDKs but targeting multiple providers
- Configuring advanced sampling parameters (temperature, top_p, penalties)
- Setting up streaming responses or structured JSON outputs
- Comparing costs across different AI models
- Building applications that need automatic provider failover
- Implementing function/tool calling across different models
- Questions about OpenRouter-specific features (routing, fallbacks, zero completion insurance)
在以下场景中应触发该技能:
- 通过统一接口调用多个AI模型提供商的API
- 实现模型降级策略或自动路由
- 使用兼容OpenAI的SDK但面向多个提供商
- 配置高级采样参数(temperature、top_p、惩罚项)
- 设置流式响应或结构化JSON输出
- 对比不同AI模型的成本
- 构建需要自动提供商故障转移的应用
- 在不同模型间实现工具/函数调用
- 关于OpenRouter特定功能的问题(路由、降级、零补全保障)
Quick Reference
快速参考
Basic Chat Completion (Python)
基础聊天补全(Python)
python
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="<OPENROUTER_API_KEY>",
)
completion = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "What is the meaning of life?"}]
)
print(completion.choices[0].message.content)python
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="<OPENROUTER_API_KEY>",
)
completion = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "What is the meaning of life?"}]
)
print(completion.choices[0].message.content)Basic Chat Completion (JavaScript/TypeScript)
基础聊天补全(JavaScript/TypeScript)
typescript
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'https://openrouter.ai/api/v1',
apiKey: '<OPENROUTER_API_KEY>',
});
const completion = await openai.chat.completions.create({
model: 'openai/gpt-4o',
messages: [{"role": 'user', "content": 'What is the meaning of life?'}],
});
console.log(completion.choices[0].message);typescript
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'https://openrouter.ai/api/v1',
apiKey: '<OPENROUTER_API_KEY>',
});
const completion = await openai.chat.completions.create({
model: 'openai/gpt-4o',
messages: [{"role": 'user', "content": 'What is the meaning of life?'}],
});
console.log(completion.choices[0].message);cURL Request
cURL 请求
bash
curl https://openrouter.ai/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-d '{
"model": "openai/gpt-4o",
"messages": [{"role": "user", "content": "What is the meaning of life?"}]
}'bash
curl https://openrouter.ai/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-d '{
"model": "openai/gpt-4o",
"messages": [{"role": "user", "content": "What is the meaning of life?"}]
}'Model Fallback Configuration (Python)
模型降级配置(Python)
python
completion = client.chat.completions.create(
model="openai/gpt-4o",
extra_body={
"models": ["anthropic/claude-3.5-sonnet", "gryphe/mythomax-l2-13b"],
},
messages=[{"role": "user", "content": "Your prompt here"}]
)python
completion = client.chat.completions.create(
model="openai/gpt-4o",
extra_body={
"models": ["anthropic/claude-3.5-sonnet", "gryphe/mythomax-l2-13b"],
},
messages=[{"role": "user", "content": "Your prompt here"}]
)Model Fallback Configuration (TypeScript)
模型降级配置(TypeScript)
typescript
const completion = await client.chat.completions.create({
model: 'openai/gpt-4o',
models: ['anthropic/claude-3.5-sonnet', 'gryphe/mythomax-l2-13b'],
messages: [{ role: 'user', content: 'Your prompt here' }],
});typescript
const completion = await openai.chat.completions.create({
model: 'openai/gpt-4o',
models: ['anthropic/claude-3.5-sonnet', 'gryphe/mythomax-l2-13b'],
messages: [{ role: 'user', content: 'Your prompt here' }],
});Auto Router (Dynamic Model Selection)
自动路由(动态模型选择)
python
completion = client.chat.completions.create(
model="openrouter/auto", # Automatically selects best model for the prompt
messages=[{"role": "user", "content": "Your prompt here"}]
)python
completion = client.chat.completions.create(
model="openrouter/auto", # Automatically selects best model for the prompt
messages=[{"role": "user", "content": "Your prompt here"}]
)Advanced Parameters Example
高级参数示例
python
completion = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Write a creative story"}],
temperature=0.8, # Higher for creativity (0.0-2.0)
max_tokens=500, # Limit response length
top_p=0.9, # Nucleus sampling (0.0-1.0)
frequency_penalty=0.5, # Reduce repetition (-2.0-2.0)
presence_penalty=0.3 # Encourage topic diversity (-2.0-2.0)
)python
completion = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Write a creative story"}],
temperature=0.8, # Higher for creativity (0.0-2.0)
max_tokens=500, # Limit response length
top_p=0.9, # Nucleus sampling (0.0-1.0)
frequency_penalty=0.5, # Reduce repetition (-2.0-2.0)
presence_penalty=0.3 # Encourage topic diversity (-2.0-2.0)
)Streaming Response
流式响应
python
stream = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end='')python
stream = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end='')JSON Mode (Structured Output)
JSON 模式(结构化输出)
python
completion = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{
"role": "user",
"content": "Extract person's name, age, and city from: John is 30 and lives in NYC"
}],
response_format={"type": "json_object"}
)python
completion = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{
"role": "user",
"content": "Extract person's name, age, and city from: John is 30 and lives in NYC"
}],
response_format={"type": "json_object"}
)Deterministic Output with Seed
使用 Seed 实现确定性输出
python
completion = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Generate a random number"}],
seed=42, # Same seed = same output (when supported)
temperature=0.0 # Deterministic sampling
)python
completion = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Generate a random number"}],
seed=42, # Same seed = same output (when supported)
temperature=0.0 # Deterministic sampling
)Key Concepts
核心概念
Model Routing
模型路由
OpenRouter provides intelligent routing capabilities:
- Auto Router (): Automatically selects the best model based on your prompt using NotDiamond
openrouter/auto - Fallback Models: Specify multiple models that automatically retry if primary fails
- Provider Routing: Automatically routes across providers for reliability
OpenRouter 提供智能路由功能:
- 自动路由():通过NotDiamond根据你的提示自动选择最佳模型
openrouter/auto - 降级模型:指定多个模型,当主模型失败时自动重试
- 提供商路由:为保证可靠性自动在多个提供商间路由
Authentication
身份验证
- Uses Bearer token authentication with API keys
- API keys can be managed programmatically
- Compatible with OpenAI SDK authentication patterns
- 使用带API密钥的Bearer令牌身份验证
- API密钥可通过编程方式管理
- 兼容OpenAI SDK的身份验证模式
Model Naming Convention
模型命名规范
Models use the format :
provider/model-name- - OpenAI's GPT-4 Optimized
openai/gpt-4o - - Anthropic's Claude 3.5 Sonnet
anthropic/claude-3.5-sonnet - - Google's free Gemini model
google/gemini-2.0-flash-exp:free - - Auto-routing system
openrouter/auto
模型采用格式:
提供商/模型名称- - OpenAI的GPT-4 Optimized
openai/gpt-4o - - Anthropic的Claude 3.5 Sonnet
anthropic/claude-3.5-sonnet - - Google的免费Gemini模型
google/gemini-2.0-flash-exp:free - - 自动路由系统
openrouter/auto
Sampling Parameters
采样参数
Temperature (0.0-2.0, default: 1.0)
- Lower = more predictable, focused responses
- Higher = more creative, diverse responses
- Use low (0.0-0.3) for factual tasks, high (0.8-1.5) for creative work
Top P (0.0-1.0, default: 1.0)
- Limits choices to percentage of likely tokens
- Dynamic filtering of improbable options
- Balance between consistency and variety
Frequency/Presence Penalties (-2.0-2.0, default: 0.0)
- Frequency: Discourages repeating tokens proportional to use
- Presence: Simpler penalty not scaled by count
- Positive values reduce repetition, negative encourage reuse
Max Tokens (integer)
- Sets maximum response length
- Cannot exceed context length minus prompt length
- Use to control costs and enforce concise replies
Temperature(0.0-2.0,默认值:1.0)
- 数值越低:输出越可预测、聚焦
- 数值越高:输出越具创意、多样化
- 事实性任务使用低数值(0.0-0.3),创意工作使用高数值(0.8-1.5)
Top P(0.0-1.0,默认值:1.0)
- 将选择范围限制在一定比例的高概率token
- 动态过滤低概率选项
- 在一致性和多样性之间取得平衡
Frequency/Presence Penalties(-2.0-2.0,默认值:0.0)
- Frequency:根据使用比例抑制重复token
- Presence:不按使用次数缩放的简单惩罚
- 正值减少重复,负值鼓励复用
Max Tokens(整数)
- 设置最大响应长度
- 不能超过上下文长度减去提示长度
- 用于控制成本并强制输出简洁回复
Response Formats
响应格式
- Standard JSON: Default chat completion format
- Streaming: Server-Sent Events (SSE) with
stream: true - JSON Mode: Guaranteed valid JSON with
response_format: {"type": "json_object"} - Structured Outputs: Schema-validated JSON responses
- 标准JSON:默认聊天补全格式
- 流式:通过Server-Sent Events(SSE)实现,需设置
stream: true - JSON模式:通过保证输出有效JSON
response_format: {"type": "json_object"} - 结构化输出:符合Schema验证的JSON响应
Advanced Features
高级功能
- Tool/Function Calling: Connect models to external APIs
- Multimodal Inputs: Support for images, PDFs, audio
- Prompt Caching: Reduce costs for repeated prompts
- Web Search Integration: Enhanced responses with web data
- Zero Completion Insurance: Protection against failed responses
- Logprobs: Access token probabilities for confidence analysis
- 工具/函数调用:将模型连接到外部API
- 多模态输入:支持图片、PDF、音频
- 提示缓存:减少重复提示的成本
- 网页搜索集成:利用网页数据增强响应
- 零补全保障:针对响应失败的保护机制
- Logprobs:访问token概率用于置信度分析
Reference Files
参考文件
This skill includes comprehensive documentation in :
references/- llms-full.md - Complete list of available models with metadata
- llms-small.md - Curated subset of popular models
- llms.md - Standard model listings
Use to read specific reference files when detailed model information is needed.
view该技能在目录下包含全面文档:
references/- llms-full.md - 包含元数据的完整可用模型列表
- llms-small.md - 精选的热门模型子集
- llms.md - 标准模型列表
当需要详细模型信息时,使用命令查看特定参考文件。
viewWorking with This Skill
使用该技能的指南
For Beginners
针对初学者
- Start with basic chat completion examples (Python/JavaScript/cURL above)
- Use the standard OpenAI SDK for easy integration
- Try simple model names like or
openai/gpt-4oanthropic/claude-3.5-sonnet - Keep parameters simple initially (just model and messages)
- 从基础聊天补全示例开始(上述Python/JavaScript/cURL示例)
- 使用标准OpenAI SDK实现轻松集成
- 尝试简单的模型名称,如或
openai/gpt-4oanthropic/claude-3.5-sonnet - 初始阶段保持参数简单(仅模型和消息)
For Intermediate Users
针对中级用户
- Implement model fallback arrays for reliability
- Experiment with sampling parameters (temperature, top_p)
- Use streaming for better UX in conversational apps
- Try for automatic model selection
openrouter/auto - Implement JSON mode for structured data extraction
- 实现模型降级数组以提升可靠性
- 尝试调整采样参数(temperature、top_p)
- 在对话应用中使用流式响应以获得更好的用户体验
- 尝试进行自动模型选择
openrouter/auto - 实现JSON模式用于结构化数据提取
For Advanced Users
针对高级用户
- Fine-tune multiple sampling parameters together
- Implement custom routing logic with fallback chains
- Use logprobs for confidence scoring
- Leverage tool/function calling capabilities
- Optimize costs by selecting appropriate models per task
- Implement prompt caching strategies
- Use seed parameter for reproducible testing
- 同时微调多个采样参数
- 实现带降级链的自定义路由逻辑
- 使用logprobs进行置信度评分
- 利用工具/函数调用功能
- 通过为不同任务选择合适的模型优化成本
- 实现提示缓存策略
- 使用seed参数进行可复现测试
Common Patterns
常见模式
Error Handling with Fallbacks
带降级的错误处理
python
try:
completion = client.chat.completions.create(
model="openai/gpt-4o",
extra_body={
"models": [
"anthropic/claude-3.5-sonnet",
"google/gemini-2.0-flash-exp:free"
]
},
messages=[{"role": "user", "content": "Your prompt"}]
)
except Exception as e:
print(f"All models failed: {e}")python
try:
completion = client.chat.completions.create(
model="openai/gpt-4o",
extra_body={
"models": [
"anthropic/claude-3.5-sonnet",
"google/gemini-2.0-flash-exp:free"
]
},
messages=[{"role": "user", "content": "Your prompt"}]
)
except Exception as e:
print(f"All models failed: {e}")Cost-Optimized Routing
成本优化路由
python
undefinedpython
undefinedUse cheaper models for simple tasks
简单任务使用低成本模型
simple_completion = client.chat.completions.create(
model="google/gemini-2.0-flash-exp:free",
messages=[{"role": "user", "content": "Simple question"}]
)
simple_completion = client.chat.completions.create(
model="google/gemini-2.0-flash-exp:free",
messages=[{"role": "user", "content": "Simple question"}]
)
Use premium models for complex tasks
复杂任务使用高级模型
complex_completion = client.chat.completions.create(
model="openai/o1",
messages=[{"role": "user", "content": "Complex reasoning task"}]
)
undefinedcomplex_completion = client.chat.completions.create(
model="openai/o1",
messages=[{"role": "user", "content": "Complex reasoning task"}]
)
undefinedContext-Aware Temperature
上下文感知的Temperature设置
python
undefinedpython
undefinedLow temperature for factual responses
事实性任务使用低Temperature
factual = client.chat.completions.create(
model="openai/gpt-4o",
temperature=0.2,
messages=[{"role": "user", "content": "What is the capital of France?"}]
)
factual = client.chat.completions.create(
model="openai/gpt-4o",
temperature=0.2,
messages=[{"role": "user", "content": "What is the capital of France?"}]
)
High temperature for creative content
创意内容使用高Temperature
creative = client.chat.completions.create(
model="openai/gpt-4o",
temperature=1.2,
messages=[{"role": "user", "content": "Write a unique story opening"}]
)
undefinedcreative = client.chat.completions.create(
model="openai/gpt-4o",
temperature=1.2,
messages=[{"role": "user", "content": "Write a unique story opening"}]
)
undefinedResources
资源
Official Documentation
官方文档
- API Reference: https://openrouter.ai/docs/api-reference/overview
- Quickstart Guide: https://openrouter.ai/docs/quickstart
- Model List: https://openrouter.ai/docs/models
- Parameters Guide: https://openrouter.ai/docs/api-reference/parameters
Key Endpoints
核心端点
- Chat Completions:
POST https://openrouter.ai/api/v1/chat/completions - List Models:
GET https://openrouter.ai/api/v1/models - Generation Info:
GET https://openrouter.ai/api/v1/generation
- 聊天补全:
POST https://openrouter.ai/api/v1/chat/completions - 模型列表:
GET https://openrouter.ai/api/v1/models - 生成信息:
GET https://openrouter.ai/api/v1/generation
Notes
注意事项
- OpenRouter normalizes API schemas across all providers
- Uses OpenAI-compatible API format for easy migration
- Automatic provider fallback if models are rate-limited or down
- Pricing based on actual model used (important for fallbacks)
- Response includes metadata about which model processed the request
- All models support streaming via Server-Sent Events
- Compatible with popular frameworks (LangChain, Vercel AI SDK, etc.)
- OpenRouter 标准化了所有提供商的API schema
- 使用兼容OpenAI的API格式,便于迁移
- 当模型被限流或故障时自动切换提供商降级
- 定价基于实际使用的模型(对降级场景很重要)
- 响应包含处理请求的模型元数据
- 所有模型均支持通过Server-Sent Events实现流式响应
- 兼容主流框架(LangChain、Vercel AI SDK等)
Best Practices
最佳实践
- Always implement fallbacks for production applications
- Use appropriate temperature based on task type (low for factual, high for creative)
- Set max_tokens to control costs and response length
- Enable streaming for better user experience in chat applications
- Use JSON mode when you need guaranteed structured output
- Test with seed parameter for reproducible results during development
- Monitor costs by selecting appropriate models per task
- Use auto-routing when unsure which model performs best
- Implement proper error handling for rate limits and failures
- Cache prompts for repeated requests to reduce costs
- 生产应用务必实现降级
- 根据任务类型设置合适的Temperature(事实性任务用低数值,创意任务用高数值)
- 设置max_tokens以控制成本和响应长度
- 在聊天应用中启用流式响应以提升用户体验
- 需要结构化输出时使用JSON模式
- 开发期间使用seed参数测试以获得可复现结果
- 通过为不同任务选择合适模型监控成本
- 不确定哪个模型表现最佳时使用自动路由
- 为限流和故障实现适当的错误处理
- 对重复请求缓存提示以降低成本