openrouter

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

OpenRouter Skill

OpenRouter 技能

Comprehensive assistance with OpenRouter API development, providing unified access to hundreds of AI models through a single endpoint with intelligent routing, automatic fallbacks, and standardized interfaces.
为OpenRouter API开发提供全面支持,通过单一端点提供对数百个AI模型的统一访问,具备智能路由、自动降级和标准化接口功能。

When to Use This Skill

何时使用该技能

This skill should be triggered when:
  • Making API calls to multiple AI model providers through a unified interface
  • Implementing model fallback strategies or auto-routing
  • Working with OpenAI-compatible SDKs but targeting multiple providers
  • Configuring advanced sampling parameters (temperature, top_p, penalties)
  • Setting up streaming responses or structured JSON outputs
  • Comparing costs across different AI models
  • Building applications that need automatic provider failover
  • Implementing function/tool calling across different models
  • Questions about OpenRouter-specific features (routing, fallbacks, zero completion insurance)
在以下场景中应触发该技能:
  • 通过统一接口调用多个AI模型提供商的API
  • 实现模型降级策略或自动路由
  • 使用兼容OpenAI的SDK但面向多个提供商
  • 配置高级采样参数(temperature、top_p、惩罚项)
  • 设置流式响应或结构化JSON输出
  • 对比不同AI模型的成本
  • 构建需要自动提供商故障转移的应用
  • 在不同模型间实现工具/函数调用
  • 关于OpenRouter特定功能的问题(路由、降级、零补全保障)

Quick Reference

快速参考

Basic Chat Completion (Python)

基础聊天补全(Python)

python
from openai import OpenAI

client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key="<OPENROUTER_API_KEY>",
)

completion = client.chat.completions.create(
  model="openai/gpt-4o",
  messages=[{"role": "user", "content": "What is the meaning of life?"}]
)
print(completion.choices[0].message.content)
python
from openai import OpenAI

client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key="<OPENROUTER_API_KEY>",
)

completion = client.chat.completions.create(
  model="openai/gpt-4o",
  messages=[{"role": "user", "content": "What is the meaning of life?"}]
)
print(completion.choices[0].message.content)

Basic Chat Completion (JavaScript/TypeScript)

基础聊天补全(JavaScript/TypeScript)

typescript
import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'https://openrouter.ai/api/v1',
  apiKey: '<OPENROUTER_API_KEY>',
});

const completion = await openai.chat.completions.create({
  model: 'openai/gpt-4o',
  messages: [{"role": 'user', "content": 'What is the meaning of life?'}],
});
console.log(completion.choices[0].message);
typescript
import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'https://openrouter.ai/api/v1',
  apiKey: '<OPENROUTER_API_KEY>',
});

const completion = await openai.chat.completions.create({
  model: 'openai/gpt-4o',
  messages: [{"role": 'user', "content": 'What is the meaning of life?'}],
});
console.log(completion.choices[0].message);

cURL Request

cURL 请求

bash
curl https://openrouter.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{"role": "user", "content": "What is the meaning of life?"}]
  }'
bash
curl https://openrouter.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{"role": "user", "content": "What is the meaning of life?"}]
  }'

Model Fallback Configuration (Python)

模型降级配置(Python)

python
completion = client.chat.completions.create(
    model="openai/gpt-4o",
    extra_body={
        "models": ["anthropic/claude-3.5-sonnet", "gryphe/mythomax-l2-13b"],
    },
    messages=[{"role": "user", "content": "Your prompt here"}]
)
python
completion = client.chat.completions.create(
    model="openai/gpt-4o",
    extra_body={
        "models": ["anthropic/claude-3.5-sonnet", "gryphe/mythomax-l2-13b"],
    },
    messages=[{"role": "user", "content": "Your prompt here"}]
)

Model Fallback Configuration (TypeScript)

模型降级配置(TypeScript)

typescript
const completion = await client.chat.completions.create({
    model: 'openai/gpt-4o',
    models: ['anthropic/claude-3.5-sonnet', 'gryphe/mythomax-l2-13b'],
    messages: [{ role: 'user', content: 'Your prompt here' }],
});
typescript
const completion = await openai.chat.completions.create({
    model: 'openai/gpt-4o',
    models: ['anthropic/claude-3.5-sonnet', 'gryphe/mythomax-l2-13b'],
    messages: [{ role: 'user', content: 'Your prompt here' }],
});

Auto Router (Dynamic Model Selection)

自动路由(动态模型选择)

python
completion = client.chat.completions.create(
    model="openrouter/auto",  # Automatically selects best model for the prompt
    messages=[{"role": "user", "content": "Your prompt here"}]
)
python
completion = client.chat.completions.create(
    model="openrouter/auto",  # Automatically selects best model for the prompt
    messages=[{"role": "user", "content": "Your prompt here"}]
)

Advanced Parameters Example

高级参数示例

python
completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Write a creative story"}],
    temperature=0.8,           # Higher for creativity (0.0-2.0)
    max_tokens=500,            # Limit response length
    top_p=0.9,                 # Nucleus sampling (0.0-1.0)
    frequency_penalty=0.5,     # Reduce repetition (-2.0-2.0)
    presence_penalty=0.3       # Encourage topic diversity (-2.0-2.0)
)
python
completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Write a creative story"}],
    temperature=0.8,           # Higher for creativity (0.0-2.0)
    max_tokens=500,            # Limit response length
    top_p=0.9,                 # Nucleus sampling (0.0-1.0)
    frequency_penalty=0.5,     # Reduce repetition (-2.0-2.0)
    presence_penalty=0.3       # Encourage topic diversity (-2.0-2.0)
)

Streaming Response

流式响应

python
stream = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end='')
python
stream = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end='')

JSON Mode (Structured Output)

JSON 模式(结构化输出)

python
completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{
        "role": "user",
        "content": "Extract person's name, age, and city from: John is 30 and lives in NYC"
    }],
    response_format={"type": "json_object"}
)
python
completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{
        "role": "user",
        "content": "Extract person's name, age, and city from: John is 30 and lives in NYC"
    }],
    response_format={"type": "json_object"}
)

Deterministic Output with Seed

使用 Seed 实现确定性输出

python
completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Generate a random number"}],
    seed=42,            # Same seed = same output (when supported)
    temperature=0.0     # Deterministic sampling
)
python
completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Generate a random number"}],
    seed=42,            # Same seed = same output (when supported)
    temperature=0.0     # Deterministic sampling
)

Key Concepts

核心概念

Model Routing

模型路由

OpenRouter provides intelligent routing capabilities:
  • Auto Router (
    openrouter/auto
    ): Automatically selects the best model based on your prompt using NotDiamond
  • Fallback Models: Specify multiple models that automatically retry if primary fails
  • Provider Routing: Automatically routes across providers for reliability
OpenRouter 提供智能路由功能:
  • 自动路由
    openrouter/auto
    ):通过NotDiamond根据你的提示自动选择最佳模型
  • 降级模型:指定多个模型,当主模型失败时自动重试
  • 提供商路由:为保证可靠性自动在多个提供商间路由

Authentication

身份验证

  • Uses Bearer token authentication with API keys
  • API keys can be managed programmatically
  • Compatible with OpenAI SDK authentication patterns
  • 使用带API密钥的Bearer令牌身份验证
  • API密钥可通过编程方式管理
  • 兼容OpenAI SDK的身份验证模式

Model Naming Convention

模型命名规范

Models use the format
provider/model-name
:
  • openai/gpt-4o
    - OpenAI's GPT-4 Optimized
  • anthropic/claude-3.5-sonnet
    - Anthropic's Claude 3.5 Sonnet
  • google/gemini-2.0-flash-exp:free
    - Google's free Gemini model
  • openrouter/auto
    - Auto-routing system
模型采用
提供商/模型名称
格式:
  • openai/gpt-4o
    - OpenAI的GPT-4 Optimized
  • anthropic/claude-3.5-sonnet
    - Anthropic的Claude 3.5 Sonnet
  • google/gemini-2.0-flash-exp:free
    - Google的免费Gemini模型
  • openrouter/auto
    - 自动路由系统

Sampling Parameters

采样参数

Temperature (0.0-2.0, default: 1.0)
  • Lower = more predictable, focused responses
  • Higher = more creative, diverse responses
  • Use low (0.0-0.3) for factual tasks, high (0.8-1.5) for creative work
Top P (0.0-1.0, default: 1.0)
  • Limits choices to percentage of likely tokens
  • Dynamic filtering of improbable options
  • Balance between consistency and variety
Frequency/Presence Penalties (-2.0-2.0, default: 0.0)
  • Frequency: Discourages repeating tokens proportional to use
  • Presence: Simpler penalty not scaled by count
  • Positive values reduce repetition, negative encourage reuse
Max Tokens (integer)
  • Sets maximum response length
  • Cannot exceed context length minus prompt length
  • Use to control costs and enforce concise replies
Temperature(0.0-2.0,默认值:1.0)
  • 数值越低:输出越可预测、聚焦
  • 数值越高:输出越具创意、多样化
  • 事实性任务使用低数值(0.0-0.3),创意工作使用高数值(0.8-1.5)
Top P(0.0-1.0,默认值:1.0)
  • 将选择范围限制在一定比例的高概率token
  • 动态过滤低概率选项
  • 在一致性和多样性之间取得平衡
Frequency/Presence Penalties(-2.0-2.0,默认值:0.0)
  • Frequency:根据使用比例抑制重复token
  • Presence:不按使用次数缩放的简单惩罚
  • 正值减少重复,负值鼓励复用
Max Tokens(整数)
  • 设置最大响应长度
  • 不能超过上下文长度减去提示长度
  • 用于控制成本并强制输出简洁回复

Response Formats

响应格式

  • Standard JSON: Default chat completion format
  • Streaming: Server-Sent Events (SSE) with
    stream: true
  • JSON Mode: Guaranteed valid JSON with
    response_format: {"type": "json_object"}
  • Structured Outputs: Schema-validated JSON responses
  • 标准JSON:默认聊天补全格式
  • 流式:通过Server-Sent Events(SSE)实现,需设置
    stream: true
  • JSON模式:通过
    response_format: {"type": "json_object"}
    保证输出有效JSON
  • 结构化输出:符合Schema验证的JSON响应

Advanced Features

高级功能

  • Tool/Function Calling: Connect models to external APIs
  • Multimodal Inputs: Support for images, PDFs, audio
  • Prompt Caching: Reduce costs for repeated prompts
  • Web Search Integration: Enhanced responses with web data
  • Zero Completion Insurance: Protection against failed responses
  • Logprobs: Access token probabilities for confidence analysis
  • 工具/函数调用:将模型连接到外部API
  • 多模态输入:支持图片、PDF、音频
  • 提示缓存:减少重复提示的成本
  • 网页搜索集成:利用网页数据增强响应
  • 零补全保障:针对响应失败的保护机制
  • Logprobs:访问token概率用于置信度分析

Reference Files

参考文件

This skill includes comprehensive documentation in
references/
:
  • llms-full.md - Complete list of available models with metadata
  • llms-small.md - Curated subset of popular models
  • llms.md - Standard model listings
Use
view
to read specific reference files when detailed model information is needed.
该技能在
references/
目录下包含全面文档:
  • llms-full.md - 包含元数据的完整可用模型列表
  • llms-small.md - 精选的热门模型子集
  • llms.md - 标准模型列表
当需要详细模型信息时,使用
view
命令查看特定参考文件。

Working with This Skill

使用该技能的指南

For Beginners

针对初学者

  1. Start with basic chat completion examples (Python/JavaScript/cURL above)
  2. Use the standard OpenAI SDK for easy integration
  3. Try simple model names like
    openai/gpt-4o
    or
    anthropic/claude-3.5-sonnet
  4. Keep parameters simple initially (just model and messages)
  1. 从基础聊天补全示例开始(上述Python/JavaScript/cURL示例)
  2. 使用标准OpenAI SDK实现轻松集成
  3. 尝试简单的模型名称,如
    openai/gpt-4o
    anthropic/claude-3.5-sonnet
  4. 初始阶段保持参数简单(仅模型和消息)

For Intermediate Users

针对中级用户

  1. Implement model fallback arrays for reliability
  2. Experiment with sampling parameters (temperature, top_p)
  3. Use streaming for better UX in conversational apps
  4. Try
    openrouter/auto
    for automatic model selection
  5. Implement JSON mode for structured data extraction
  1. 实现模型降级数组以提升可靠性
  2. 尝试调整采样参数(temperature、top_p)
  3. 在对话应用中使用流式响应以获得更好的用户体验
  4. 尝试
    openrouter/auto
    进行自动模型选择
  5. 实现JSON模式用于结构化数据提取

For Advanced Users

针对高级用户

  1. Fine-tune multiple sampling parameters together
  2. Implement custom routing logic with fallback chains
  3. Use logprobs for confidence scoring
  4. Leverage tool/function calling capabilities
  5. Optimize costs by selecting appropriate models per task
  6. Implement prompt caching strategies
  7. Use seed parameter for reproducible testing
  1. 同时微调多个采样参数
  2. 实现带降级链的自定义路由逻辑
  3. 使用logprobs进行置信度评分
  4. 利用工具/函数调用功能
  5. 通过为不同任务选择合适的模型优化成本
  6. 实现提示缓存策略
  7. 使用seed参数进行可复现测试

Common Patterns

常见模式

Error Handling with Fallbacks

带降级的错误处理

python
try:
    completion = client.chat.completions.create(
        model="openai/gpt-4o",
        extra_body={
            "models": [
                "anthropic/claude-3.5-sonnet",
                "google/gemini-2.0-flash-exp:free"
            ]
        },
        messages=[{"role": "user", "content": "Your prompt"}]
    )
except Exception as e:
    print(f"All models failed: {e}")
python
try:
    completion = client.chat.completions.create(
        model="openai/gpt-4o",
        extra_body={
            "models": [
                "anthropic/claude-3.5-sonnet",
                "google/gemini-2.0-flash-exp:free"
            ]
        },
        messages=[{"role": "user", "content": "Your prompt"}]
    )
except Exception as e:
    print(f"All models failed: {e}")

Cost-Optimized Routing

成本优化路由

python
undefined
python
undefined

Use cheaper models for simple tasks

简单任务使用低成本模型

simple_completion = client.chat.completions.create( model="google/gemini-2.0-flash-exp:free", messages=[{"role": "user", "content": "Simple question"}] )
simple_completion = client.chat.completions.create( model="google/gemini-2.0-flash-exp:free", messages=[{"role": "user", "content": "Simple question"}] )

Use premium models for complex tasks

复杂任务使用高级模型

complex_completion = client.chat.completions.create( model="openai/o1", messages=[{"role": "user", "content": "Complex reasoning task"}] )
undefined
complex_completion = client.chat.completions.create( model="openai/o1", messages=[{"role": "user", "content": "Complex reasoning task"}] )
undefined

Context-Aware Temperature

上下文感知的Temperature设置

python
undefined
python
undefined

Low temperature for factual responses

事实性任务使用低Temperature

factual = client.chat.completions.create( model="openai/gpt-4o", temperature=0.2, messages=[{"role": "user", "content": "What is the capital of France?"}] )
factual = client.chat.completions.create( model="openai/gpt-4o", temperature=0.2, messages=[{"role": "user", "content": "What is the capital of France?"}] )

High temperature for creative content

创意内容使用高Temperature

creative = client.chat.completions.create( model="openai/gpt-4o", temperature=1.2, messages=[{"role": "user", "content": "Write a unique story opening"}] )
undefined
creative = client.chat.completions.create( model="openai/gpt-4o", temperature=1.2, messages=[{"role": "user", "content": "Write a unique story opening"}] )
undefined

Resources

资源

Official Documentation

官方文档

Key Endpoints

核心端点

  • Chat Completions:
    POST https://openrouter.ai/api/v1/chat/completions
  • List Models:
    GET https://openrouter.ai/api/v1/models
  • Generation Info:
    GET https://openrouter.ai/api/v1/generation
  • 聊天补全:
    POST https://openrouter.ai/api/v1/chat/completions
  • 模型列表:
    GET https://openrouter.ai/api/v1/models
  • 生成信息:
    GET https://openrouter.ai/api/v1/generation

Notes

注意事项

  • OpenRouter normalizes API schemas across all providers
  • Uses OpenAI-compatible API format for easy migration
  • Automatic provider fallback if models are rate-limited or down
  • Pricing based on actual model used (important for fallbacks)
  • Response includes metadata about which model processed the request
  • All models support streaming via Server-Sent Events
  • Compatible with popular frameworks (LangChain, Vercel AI SDK, etc.)
  • OpenRouter 标准化了所有提供商的API schema
  • 使用兼容OpenAI的API格式,便于迁移
  • 当模型被限流或故障时自动切换提供商降级
  • 定价基于实际使用的模型(对降级场景很重要)
  • 响应包含处理请求的模型元数据
  • 所有模型均支持通过Server-Sent Events实现流式响应
  • 兼容主流框架(LangChain、Vercel AI SDK等)

Best Practices

最佳实践

  1. Always implement fallbacks for production applications
  2. Use appropriate temperature based on task type (low for factual, high for creative)
  3. Set max_tokens to control costs and response length
  4. Enable streaming for better user experience in chat applications
  5. Use JSON mode when you need guaranteed structured output
  6. Test with seed parameter for reproducible results during development
  7. Monitor costs by selecting appropriate models per task
  8. Use auto-routing when unsure which model performs best
  9. Implement proper error handling for rate limits and failures
  10. Cache prompts for repeated requests to reduce costs
  1. 生产应用务必实现降级
  2. 根据任务类型设置合适的Temperature(事实性任务用低数值,创意任务用高数值)
  3. 设置max_tokens以控制成本和响应长度
  4. 在聊天应用中启用流式响应以提升用户体验
  5. 需要结构化输出时使用JSON模式
  6. 开发期间使用seed参数测试以获得可复现结果
  7. 通过为不同任务选择合适模型监控成本
  8. 不确定哪个模型表现最佳时使用自动路由
  9. 为限流和故障实现适当的错误处理
  10. 对重复请求缓存提示以降低成本