openrouter

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

OpenRouter Skill

OpenRouter 技能

Comprehensive assistance with OpenRouter API development, providing unified access to hundreds of AI models through a single endpoint with intelligent routing, automatic fallbacks, and standardized interfaces.

为OpenRouter API开发提供全面支持，通过单一端点提供对数百个AI模型的统一访问，具备智能路由、自动降级和标准化接口功能。

When to Use This Skill

何时使用该技能

This skill should be triggered when:

Making API calls to multiple AI model providers through a unified interface
Implementing model fallback strategies or auto-routing
Working with OpenAI-compatible SDKs but targeting multiple providers
Configuring advanced sampling parameters (temperature, top_p, penalties)
Setting up streaming responses or structured JSON outputs
Comparing costs across different AI models
Building applications that need automatic provider failover
Implementing function/tool calling across different models
Questions about OpenRouter-specific features (routing, fallbacks, zero completion insurance)

在以下场景中应触发该技能：

通过统一接口调用多个AI模型提供商的API
实现模型降级策略或自动路由
使用兼容OpenAI的SDK但面向多个提供商
配置高级采样参数（temperature、top_p、惩罚项）
设置流式响应或结构化JSON输出
对比不同AI模型的成本
构建需要自动提供商故障转移的应用
在不同模型间实现工具/函数调用
关于OpenRouter特定功能的问题（路由、降级、零补全保障）

Quick Reference

快速参考

Basic Chat Completion (Python)

基础聊天补全（Python）

python

from openai import OpenAI

client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key="<OPENROUTER_API_KEY>",
)

completion = client.chat.completions.create(
  model="openai/gpt-4o",
  messages=[{"role": "user", "content": "What is the meaning of life?"}]
)
print(completion.choices[0].message.content)

python

from openai import OpenAI

client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key="<OPENROUTER_API_KEY>",
)

completion = client.chat.completions.create(
  model="openai/gpt-4o",
  messages=[{"role": "user", "content": "What is the meaning of life?"}]
)
print(completion.choices[0].message.content)

Basic Chat Completion (JavaScript/TypeScript)

基础聊天补全（JavaScript/TypeScript）

typescript

import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'https://openrouter.ai/api/v1',
  apiKey: '<OPENROUTER_API_KEY>',
});

const completion = await openai.chat.completions.create({
  model: 'openai/gpt-4o',
  messages: [{"role": 'user', "content": 'What is the meaning of life?'}],
});
console.log(completion.choices[0].message);

typescript

import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'https://openrouter.ai/api/v1',
  apiKey: '<OPENROUTER_API_KEY>',
});

const completion = await openai.chat.completions.create({
  model: 'openai/gpt-4o',
  messages: [{"role": 'user', "content": 'What is the meaning of life?'}],
});
console.log(completion.choices[0].message);

cURL Request

cURL 请求

bash

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{"role": "user", "content": "What is the meaning of life?"}]
  }'

bash

curl https://openrouter.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{"role": "user", "content": "What is the meaning of life?"}]
  }'

Model Fallback Configuration (Python)

模型降级配置（Python）

python

completion = client.chat.completions.create(
    model="openai/gpt-4o",
    extra_body={
        "models": ["anthropic/claude-3.5-sonnet", "gryphe/mythomax-l2-13b"],
    },
    messages=[{"role": "user", "content": "Your prompt here"}]
)

python

completion = client.chat.completions.create(
    model="openai/gpt-4o",
    extra_body={
        "models": ["anthropic/claude-3.5-sonnet", "gryphe/mythomax-l2-13b"],
    },
    messages=[{"role": "user", "content": "Your prompt here"}]
)

Model Fallback Configuration (TypeScript)

模型降级配置（TypeScript）

typescript

const completion = await client.chat.completions.create({
    model: 'openai/gpt-4o',
    models: ['anthropic/claude-3.5-sonnet', 'gryphe/mythomax-l2-13b'],
    messages: [{ role: 'user', content: 'Your prompt here' }],
});

typescript

const completion = await openai.chat.completions.create({
    model: 'openai/gpt-4o',
    models: ['anthropic/claude-3.5-sonnet', 'gryphe/mythomax-l2-13b'],
    messages: [{ role: 'user', content: 'Your prompt here' }],
});

Auto Router (Dynamic Model Selection)

自动路由（动态模型选择）

python

completion = client.chat.completions.create(
    model="openrouter/auto",  # Automatically selects best model for the prompt
    messages=[{"role": "user", "content": "Your prompt here"}]
)

python

completion = client.chat.completions.create(
    model="openrouter/auto",  # Automatically selects best model for the prompt
    messages=[{"role": "user", "content": "Your prompt here"}]
)

Advanced Parameters Example

高级参数示例

python

completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Write a creative story"}],
    temperature=0.8,           # Higher for creativity (0.0-2.0)
    max_tokens=500,            # Limit response length
    top_p=0.9,                 # Nucleus sampling (0.0-1.0)
    frequency_penalty=0.5,     # Reduce repetition (-2.0-2.0)
    presence_penalty=0.3       # Encourage topic diversity (-2.0-2.0)
)

python

completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Write a creative story"}],
    temperature=0.8,           # Higher for creativity (0.0-2.0)
    max_tokens=500,            # Limit response length
    top_p=0.9,                 # Nucleus sampling (0.0-1.0)
    frequency_penalty=0.5,     # Reduce repetition (-2.0-2.0)
    presence_penalty=0.3       # Encourage topic diversity (-2.0-2.0)
)

Streaming Response

流式响应

python

stream = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end='')

python

stream = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end='')

JSON Mode (Structured Output)

JSON 模式（结构化输出）

python

completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{
        "role": "user",
        "content": "Extract person's name, age, and city from: John is 30 and lives in NYC"
    }],
    response_format={"type": "json_object"}
)

python

completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{
        "role": "user",
        "content": "Extract person's name, age, and city from: John is 30 and lives in NYC"
    }],
    response_format={"type": "json_object"}
)

Deterministic Output with Seed

使用 Seed 实现确定性输出

python

completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Generate a random number"}],
    seed=42,            # Same seed = same output (when supported)
    temperature=0.0     # Deterministic sampling
)

python

completion = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Generate a random number"}],
    seed=42,            # Same seed = same output (when supported)
    temperature=0.0     # Deterministic sampling
)

Key Concepts

核心概念

Model Routing

模型路由

OpenRouter provides intelligent routing capabilities:

Auto Router (
```
openrouter/auto
```
): Automatically selects the best model based on your prompt using NotDiamond
Fallback Models: Specify multiple models that automatically retry if primary fails
Provider Routing: Automatically routes across providers for reliability

OpenRouter 提供智能路由功能：

自动路由（
```
openrouter/auto
```
）：通过NotDiamond根据你的提示自动选择最佳模型
降级模型：指定多个模型，当主模型失败时自动重试
提供商路由：为保证可靠性自动在多个提供商间路由

Authentication

身份验证

Uses Bearer token authentication with API keys
API keys can be managed programmatically
Compatible with OpenAI SDK authentication patterns

使用带API密钥的Bearer令牌身份验证
API密钥可通过编程方式管理
兼容OpenAI SDK的身份验证模式

Model Naming Convention

模型命名规范

Models use the format

provider/model-name

```
openai/gpt-4o
```
- OpenAI's GPT-4 Optimized
```
anthropic/claude-3.5-sonnet
```
- Anthropic's Claude 3.5 Sonnet
```
google/gemini-2.0-flash-exp:free
```
- Google's free Gemini model
```
openrouter/auto
```
- Auto-routing system

模型采用

提供商/模型名称

格式：

```
openai/gpt-4o
```
- OpenAI的GPT-4 Optimized
```
anthropic/claude-3.5-sonnet
```
- Anthropic的Claude 3.5 Sonnet
```
google/gemini-2.0-flash-exp:free
```
- Google的免费Gemini模型
```
openrouter/auto
```
- 自动路由系统

Sampling Parameters

采样参数

Temperature (0.0-2.0, default: 1.0)

Lower = more predictable, focused responses
Higher = more creative, diverse responses
Use low (0.0-0.3) for factual tasks, high (0.8-1.5) for creative work

Top P (0.0-1.0, default: 1.0)

Limits choices to percentage of likely tokens
Dynamic filtering of improbable options
Balance between consistency and variety

Frequency/Presence Penalties (-2.0-2.0, default: 0.0)

Frequency: Discourages repeating tokens proportional to use
Presence: Simpler penalty not scaled by count
Positive values reduce repetition, negative encourage reuse

Max Tokens (integer)

Sets maximum response length
Cannot exceed context length minus prompt length
Use to control costs and enforce concise replies

Temperature（0.0-2.0，默认值：1.0）

数值越低：输出越可预测、聚焦
数值越高：输出越具创意、多样化
事实性任务使用低数值（0.0-0.3），创意工作使用高数值（0.8-1.5）

Top P（0.0-1.0，默认值：1.0）

将选择范围限制在一定比例的高概率token
动态过滤低概率选项
在一致性和多样性之间取得平衡

Frequency/Presence Penalties（-2.0-2.0，默认值：0.0）

Frequency：根据使用比例抑制重复token
Presence：不按使用次数缩放的简单惩罚
正值减少重复，负值鼓励复用

Max Tokens（整数）

设置最大响应长度
不能超过上下文长度减去提示长度
用于控制成本并强制输出简洁回复

Response Formats

响应格式

Standard JSON: Default chat completion format
Streaming: Server-Sent Events (SSE) with
```
stream: true
```

JSON Mode: Guaranteed valid JSON with

response_format: {"type": "json_object"}

Structured Outputs: Schema-validated JSON responses

标准JSON：默认聊天补全格式
流式：通过Server-Sent Events（SSE）实现，需设置
```
stream: true
```
JSON模式：通过
```
response_format: {"type": "json_object"}
```
保证输出有效JSON
结构化输出：符合Schema验证的JSON响应

Advanced Features

高级功能

Tool/Function Calling: Connect models to external APIs
Multimodal Inputs: Support for images, PDFs, audio
Prompt Caching: Reduce costs for repeated prompts
Web Search Integration: Enhanced responses with web data
Zero Completion Insurance: Protection against failed responses
Logprobs: Access token probabilities for confidence analysis

工具/函数调用：将模型连接到外部API
多模态输入：支持图片、PDF、音频
提示缓存：减少重复提示的成本
网页搜索集成：利用网页数据增强响应
零补全保障：针对响应失败的保护机制
Logprobs：访问token概率用于置信度分析

Reference Files

参考文件

This skill includes comprehensive documentation in

references/

llms-full.md - Complete list of available models with metadata
llms-small.md - Curated subset of popular models
llms.md - Standard model listings

Use

view

to read specific reference files when detailed model information is needed.

该技能在

references/

目录下包含全面文档：

llms-full.md - 包含元数据的完整可用模型列表
llms-small.md - 精选的热门模型子集
llms.md - 标准模型列表

当需要详细模型信息时，使用

view

命令查看特定参考文件。

Working with This Skill

使用该技能的指南

For Beginners

针对初学者

Start with basic chat completion examples (Python/JavaScript/cURL above)
Use the standard OpenAI SDK for easy integration

Try simple model names like

openai/gpt-4o

anthropic/claude-3.5-sonnet

Keep parameters simple initially (just model and messages)

从基础聊天补全示例开始（上述Python/JavaScript/cURL示例）
使用标准OpenAI SDK实现轻松集成

尝试简单的模型名称，如

openai/gpt-4o

或

anthropic/claude-3.5-sonnet

初始阶段保持参数简单（仅模型和消息）

For Intermediate Users

针对中级用户

Implement model fallback arrays for reliability
Experiment with sampling parameters (temperature, top_p)
Use streaming for better UX in conversational apps
Try
```
openrouter/auto
```
for automatic model selection
Implement JSON mode for structured data extraction

实现模型降级数组以提升可靠性
尝试调整采样参数（temperature、top_p）
在对话应用中使用流式响应以获得更好的用户体验
尝试
```
openrouter/auto
```
进行自动模型选择
实现JSON模式用于结构化数据提取

For Advanced Users

针对高级用户

Fine-tune multiple sampling parameters together
Implement custom routing logic with fallback chains
Use logprobs for confidence scoring
Leverage tool/function calling capabilities
Optimize costs by selecting appropriate models per task
Implement prompt caching strategies
Use seed parameter for reproducible testing

同时微调多个采样参数
实现带降级链的自定义路由逻辑
使用logprobs进行置信度评分
利用工具/函数调用功能
通过为不同任务选择合适的模型优化成本
实现提示缓存策略
使用seed参数进行可复现测试

Common Patterns

常见模式

Error Handling with Fallbacks

带降级的错误处理

python

try:
    completion = client.chat.completions.create(
        model="openai/gpt-4o",
        extra_body={
            "models": [
                "anthropic/claude-3.5-sonnet",
                "google/gemini-2.0-flash-exp:free"
            ]
        },
        messages=[{"role": "user", "content": "Your prompt"}]
    )
except Exception as e:
    print(f"All models failed: {e}")

python

try:
    completion = client.chat.completions.create(
        model="openai/gpt-4o",
        extra_body={
            "models": [
                "anthropic/claude-3.5-sonnet",
                "google/gemini-2.0-flash-exp:free"
            ]
        },
        messages=[{"role": "user", "content": "Your prompt"}]
    )
except Exception as e:
    print(f"All models failed: {e}")

Cost-Optimized Routing

成本优化路由

python

undefined

python

undefined

Use cheaper models for simple tasks

简单任务使用低成本模型

simple_completion = client.chat.completions.create( model="google/gemini-2.0-flash-exp:free", messages=[{"role": "user", "content": "Simple question"}] )

Use premium models for complex tasks

复杂任务使用高级模型

complex_completion = client.chat.completions.create( model="openai/o1", messages=[{"role": "user", "content": "Complex reasoning task"}] )

undefined

complex_completion = client.chat.completions.create( model="openai/o1", messages=[{"role": "user", "content": "Complex reasoning task"}] )

undefined

Context-Aware Temperature

上下文感知的Temperature设置

python

undefined

python

undefined

Low temperature for factual responses

事实性任务使用低Temperature

factual = client.chat.completions.create( model="openai/gpt-4o", temperature=0.2, messages=[{"role": "user", "content": "What is the capital of France?"}] )

High temperature for creative content

创意内容使用高Temperature

creative = client.chat.completions.create( model="openai/gpt-4o", temperature=1.2, messages=[{"role": "user", "content": "Write a unique story opening"}] )

undefined

creative = client.chat.completions.create( model="openai/gpt-4o", temperature=1.2, messages=[{"role": "user", "content": "Write a unique story opening"}] )

undefined

Resources

资源

Official Documentation

官方文档

API Reference: https://openrouter.ai/docs/api-reference/overview
Quickstart Guide: https://openrouter.ai/docs/quickstart
Model List: https://openrouter.ai/docs/models
Parameters Guide: https://openrouter.ai/docs/api-reference/parameters

API参考：https://openrouter.ai/docs/api-reference/overview
快速入门指南：https://openrouter.ai/docs/quickstart
模型列表：https://openrouter.ai/docs/models
参数指南：https://openrouter.ai/docs/api-reference/parameters

Key Endpoints

核心端点

Chat Completions:

POST https://openrouter.ai/api/v1/chat/completions

List Models:
```
GET https://openrouter.ai/api/v1/models
```

Generation Info:

GET https://openrouter.ai/api/v1/generation

聊天补全：

POST https://openrouter.ai/api/v1/chat/completions

模型列表：
```
GET https://openrouter.ai/api/v1/models
```

生成信息：

GET https://openrouter.ai/api/v1/generation

Notes

注意事项

OpenRouter normalizes API schemas across all providers
Uses OpenAI-compatible API format for easy migration
Automatic provider fallback if models are rate-limited or down
Pricing based on actual model used (important for fallbacks)
Response includes metadata about which model processed the request
All models support streaming via Server-Sent Events
Compatible with popular frameworks (LangChain, Vercel AI SDK, etc.)

OpenRouter 标准化了所有提供商的API schema
使用兼容OpenAI的API格式，便于迁移
当模型被限流或故障时自动切换提供商降级
定价基于实际使用的模型（对降级场景很重要）
响应包含处理请求的模型元数据
所有模型均支持通过Server-Sent Events实现流式响应
兼容主流框架（LangChain、Vercel AI SDK等）

Best Practices

最佳实践

Always implement fallbacks for production applications
Use appropriate temperature based on task type (low for factual, high for creative)
Set max_tokens to control costs and response length
Enable streaming for better user experience in chat applications
Use JSON mode when you need guaranteed structured output
Test with seed parameter for reproducible results during development
Monitor costs by selecting appropriate models per task
Use auto-routing when unsure which model performs best
Implement proper error handling for rate limits and failures
Cache prompts for repeated requests to reduce costs

生产应用务必实现降级
根据任务类型设置合适的Temperature（事实性任务用低数值，创意任务用高数值）
设置max_tokens以控制成本和响应长度
在聊天应用中启用流式响应以提升用户体验
需要结构化输出时使用JSON模式
开发期间使用seed参数测试以获得可复现结果
通过为不同任务选择合适模型监控成本
不确定哪个模型表现最佳时使用自动路由
为限流和故障实现适当的错误处理
对重复请求缓存提示以降低成本