openrouter
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOpenRouter API for AI Agents
面向AI Agent的OpenRouter API
Expert guidance for AI agents integrating with OpenRouter API - unified access to 400+ models from 90+ providers.
When to use this skill:
- Making chat completions via OpenRouter API
- Selecting appropriate models and variants
- Implementing streaming responses
- Using tool/function calling
- Enforcing structured outputs
- Integrating web search
- Handling multimodal inputs (images, audio, video, PDFs)
- Managing model routing and fallbacks
- Handling errors and retries
- Optimizing cost and performance
为集成OpenRouter API的AI Agent提供专业指导——可统一访问来自90+供应商的400+模型。
何时使用本技能:
- 通过OpenRouter API实现对话补全
- 选择合适的模型及变体
- 实现流式响应
- 使用工具/函数调用
- 强制生成结构化输出
- 集成网页搜索
- 处理多模态输入(图片、音频、视频、PDF)
- 管理模型路由与降级方案
- 处理错误与重试
- 优化成本与性能
API Basics
API基础
Making a Request
发起请求
Endpoint:
POST https://openrouter.ai/api/v1/chat/completionsHeaders (required):
typescript
{
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json',
// Optional: for app attribution
'HTTP-Referer': 'https://your-app.com',
'X-Title': 'Your App Name'
}Minimal request structure:
typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: [
{ role: 'user', content: 'Your prompt here' }
]
})
});Endpoint:
POST https://openrouter.ai/api/v1/chat/completions请求头(必填):
typescript
{
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json',
// 可选:用于应用归因
'HTTP-Referer': 'https://your-app.com',
'X-Title': 'Your App Name'
}最简请求结构:
typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: [
{ role: 'user', content: 'Your prompt here' }
]
})
});Response Structure
响应结构
Non-streaming response:
json
{
"id": "gen-abc123",
"choices": [{
"message": {
"role": "assistant",
"content": "Response text here"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 20,
"total_tokens": 30
},
"model": "anthropic/claude-3.5-sonnet"
}Key fields:
- - The assistant's response
choices[0].message.content - - Why generation stopped (stop, length, tool_calls, etc.)
choices[0].finish_reason - - Token counts and cost information
usage - - Actual model used (may differ from requested)
model
非流式响应:
json
{
"id": "gen-abc123",
"choices": [{
"message": {
"role": "assistant",
"content": "Response text here"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 20,
"total_tokens": 30
},
"model": "anthropic/claude-3.5-sonnet"
}关键字段:
- - 助手的响应内容
choices[0].message.content - - 生成停止的原因(stop、length、tool_calls等)
choices[0].finish_reason - - Token计数与成本信息
usage - - 实际使用的模型(可能与请求的模型不同)
model
When to Use Streaming vs Non-Streaming
流式与非流式响应的适用场景
Use streaming () when:
stream: true- Real-time responses needed (chat interfaces, interactive tools)
- Latency matters (user-facing applications)
- Large responses expected (long-form content)
- Want to show progressive output
Use non-streaming when:
- Processing in background (batch jobs, async tasks)
- Need complete response before processing
- Building to an API/endpoint
- Response is short (few tokens)
Streaming basics:
typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: { /* ... */ },
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: '...' }],
stream: true
})
});
for await (const chunk of response.body) {
const text = new TextDecoder().decode(chunk);
const lines = text.split('\n').filter(line => line.startsWith('data: '));
for (const line of lines) {
const data = line.slice(6); // Remove 'data: '
if (data === '[DONE]') break;
const parsed = JSON.parse(data);
const content = parsed.choices?.[0]?.delta?.content;
if (content) {
// Accumulate or display content
}
}
}**使用流式响应()**的场景:
stream: true- 需要实时响应(聊天界面、交互式工具)
- 延迟敏感(面向用户的应用)
- 预期会生成大篇幅响应(长文本内容)
- 需要逐步展示输出内容
使用非流式响应的场景:
- 后台处理(批量任务、异步任务)
- 需要完整响应后再进行处理
- 构建API端点
- 响应内容简短(仅少量Token)
流式响应基础示例:
typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: { /* ... */ },
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: '...' }],
stream: true
})
});
for await (const chunk of response.body) {
const text = new TextDecoder().decode(chunk);
const lines = text.split('\n').filter(line => line.startsWith('data: '));
for (const line of lines) {
const data = line.slice(6); // 移除 'data: '
if (data === '[DONE]') break;
const parsed = JSON.parse(data);
const content = parsed.choices?.[0]?.delta?.content;
if (content) {
// 累积或展示内容
}
}
}Model Selection
模型选择
Model Identifier Format
模型标识符格式
Format:
provider/model-name[:variant]Examples:
- - Specific model
anthropic/claude-3.5-sonnet - - With web search enabled
openai/gpt-4o:online - - Free tier variant
google/gemini-2.0-flash:free
格式:
provider/model-name[:variant]示例:
- - 特定模型
anthropic/claude-3.5-sonnet - - 启用网页搜索的变体
openai/gpt-4o:online - - 免费层级变体
google/gemini-2.0-flash:free
Model Variants and When to Use Them
模型变体及适用场景
| Variant | Use When | Tradeoffs |
|---|---|---|
| Cost is primary concern, testing, prototyping | Rate limits, lower quality models |
| Need current information, real-time data | Higher cost, web search latency |
| Large context window needed | May be slower, higher cost |
| Complex reasoning, multi-step problems | Higher token usage, slower |
| Speed is critical | May have quality tradeoffs |
| Need specific provider | No fallbacks, may be less available |
| 变体 | 使用场景 | 权衡 |
|---|---|---|
| 成本为首要考虑因素、测试、原型开发 | 有调用速率限制,模型质量较低 |
| 需要当前信息、实时数据 | 成本更高,网页搜索存在延迟 |
| 需要大上下文窗口 | 可能更慢,成本更高 |
| 复杂推理、多步骤问题 | Token消耗更高,速度更慢 |
| 速度至关重要 | 可能存在质量权衡 |
| 需要特定供应商的模型 | 无降级方案,可用性可能较低 |
Default Model Choices by Task
按任务选择默认模型
General purpose: or
anthropic/claude-3.5-sonnetopenai/gpt-4o- Balanced quality, speed, cost
- Good for most tasks
Coding: or
anthropic/claude-3.5-sonnetopenai/gpt-4o- Strong code generation and understanding
- Good reasoning
Complex reasoning: or
anthropic/claude-opus-4:thinkingopenai/o3- Deep reasoning capabilities
- Higher cost, slower
Fast responses: or
openai/gpt-4o-mini:nitrogoogle/gemini-2.0-flash- Minimal latency
- Good for real-time applications
Cost-sensitive: or
google/gemini-2.0-flash:freemeta-llama/llama-3.1-70b:free- No cost with limits
- Good for high-volume, lower-complexity tasks
Current information: or
anthropic/claude-3.5-sonnet:onlinegoogle/gemini-2.5-pro:online- Web search built-in
- Real-time data
Large context: or
anthropic/claude-3.5-sonnet:extendedgoogle/gemini-2.5-pro:extended- 200K+ context windows
- Document analysis, codebase understanding
通用场景: 或
anthropic/claude-3.5-sonnetopenai/gpt-4o- 质量、速度、成本平衡
- 适用于大多数任务
代码开发: 或
anthropic/claude-3.5-sonnetopenai/gpt-4o- 强大的代码生成与理解能力
- 推理能力出色
复杂推理: 或
anthropic/claude-opus-4:thinkingopenai/o3- 深度推理能力
- 成本更高,速度更慢
快速响应: 或
openai/gpt-4o-mini:nitrogoogle/gemini-2.0-flash- 延迟极低
- 适用于实时应用
成本敏感场景: 或
google/gemini-2.0-flash:freemeta-llama/llama-3.1-70b:free- 免费使用但有额度限制
- 适用于高容量、低复杂度任务
当前信息获取: 或
anthropic/claude-3.5-sonnet:onlinegoogle/gemini-2.5-pro:online- 内置网页搜索
- 可获取实时数据
大上下文需求: 或
anthropic/claude-3.5-sonnet:extendedgoogle/gemini-2.5-pro:extended- 支持200K+上下文窗口
- 适用于文档分析、代码库理解
Provider Routing Preferences
供应商路由偏好
Default behavior: OpenRouter automatically selects best provider
Explicit provider order:
typescript
{
provider: {
order: ['anthropic', 'openai', 'google'],
allow_fallbacks: true,
sort: 'price' // 'price', 'latency', or 'throughput'
}
}When to set provider order:
- Have preferred provider arrangements
- Need to optimize for specific metric (cost, speed)
- Want to exclude certain providers
- Have BYOK (Bring Your Own Key) for specific providers
默认行为:OpenRouter自动选择最优供应商
显式指定供应商顺序:
typescript
{
provider: {
order: ['anthropic', 'openai', 'google'],
allow_fallbacks: true,
sort: 'price' // 'price'、'latency' 或 'throughput'
}
}何时设置供应商顺序:
- 有偏好的供应商合作安排
- 需要针对特定指标优化(成本、速度)
- 想要排除某些供应商
- 为特定供应商使用自有密钥(BYOK)
Model Fallbacks
模型降级方案
Automatic fallback - try multiple models in order:
typescript
{
models: [
'anthropic/claude-3.5-sonnet',
'openai/gpt-4o',
'google/gemini-2.0-flash'
]
}When to use fallbacks:
- High reliability required
- Multiple providers acceptable
- Want graceful degradation
- Avoid single point of failure
Fallback behavior:
- Tries first model
- Falls to next on error (5xx, 429, timeout)
- Uses whichever succeeds
- Returns which model was used in field
model
自动降级 - 按顺序尝试多个模型:
typescript
{
models: [
'anthropic/claude-3.5-sonnet',
'openai/gpt-4o',
'google/gemini-2.0-flash'
]
}何时使用降级方案:
- 要求高可靠性
- 可接受多个供应商的模型
- 希望实现优雅降级
- 避免单点故障
降级行为:
- 首先尝试第一个模型
- 遇到错误(5xx、429、超时)时切换到下一个模型
- 使用第一个成功的模型
- 在字段中返回实际使用的模型
model
Parameters You Need
必备参数
Core Parameters
核心参数
model (string, optional)
- Which model to use
- Default: user's default model
- Always specify for consistency
messages (Message[], required)
- Conversation history
- Structure:
{ role: 'user'|'assistant'|'system', content: string | ContentPart[] } - For multimodal: content can be array of text and image_url parts
stream (boolean, default: false)
- Enable Server-Sent Events streaming
- Use for real-time responses
temperature (float, 0.0-2.0, default: 1.0)
- Controls randomness
- 0.0-0.3: Deterministic, factual responses (code, precise answers)
- 0.4-0.7: Balanced (general use)
- 0.8-1.2: Creative (brainstorming, creative writing)
- 1.3-2.0: Highly creative, unpredictable (experimental)
max_tokens (integer, optional)
- Maximum tokens to generate
- Always set to control cost and prevent runaway responses
- Typical: 100-500 for short, 1000-2000 for long responses
- Model limit: context_length - prompt_length
top_p (float, 0.0-1.0, default: 1.0)
- Nucleus sampling - limits to top probability mass
- Use instead of temperature when you want predictable diversity
- 0.9-0.95: Common settings for quality
top_k (integer, 0+, default: 0/disabled)
- Limit to K most likely tokens
- 1: Always most likely (deterministic)
- 40-50: Balanced
- Not available for OpenAI models
model(字符串,可选)
- 使用的模型
- 默认值:用户的默认模型
- 为保持一致性,建议始终指定
messages(Message数组,必填)
- 对话历史
- 结构:
{ role: 'user'|'assistant'|'system', content: string | ContentPart[] } - 多模态场景:content可以是文本和image_url部分的数组
stream(布尔值,默认:false)
- 启用Server-Sent Events流式响应
- 用于实时响应场景
temperature(浮点数,0.0-2.0,默认:1.0)
- 控制生成的随机性
- 0.0-0.3:确定性、事实性响应(代码、精准答案)
- 0.4-0.7:平衡型(通用场景)
- 0.8-1.2:创意型(头脑风暴、创意写作)
- 1.3-2.0:高度创意、不可预测(实验性场景)
max_tokens(整数,可选)
- 生成的最大Token数
- 建议始终设置以控制成本并避免无限制生成
- 典型值:短响应100-500,长响应1000-2000
- 模型限制:context_length - prompt_length
top_p(浮点数,0.0-1.0,默认:1.0)
- 核采样 - 限制为Top概率质量的Token
- 当你想要可预测的多样性时,使用此参数替代temperature
- 0.9-0.95:常用的高质量设置
top_k(整数,0+,默认:0/禁用)
- 限制为K个最可能的Token
- 1:始终选择最可能的Token(确定性)
- 40-50:平衡型
- OpenAI模型不支持此参数
Sampling Strategy Guidelines
采样策略指南
For code generation:
For factual responses:
For creative writing:
For brainstorming:
For chat:
temperature: 0.1-0.3, top_p: 0.95temperature: 0.0-0.2temperature: 0.8-1.2temperature: 1.0-1.5temperature: 0.6-0.8代码生成:
事实性响应:
创意写作:
头脑风暴:
聊天场景:
temperature: 0.1-0.3, top_p: 0.95temperature: 0.0-0.2temperature: 0.8-1.2temperature: 1.0-1.5temperature: 0.6-0.8Tool Calling Parameters
工具调用参数
tools (Tool[], default: [])
- Available functions for model to call
- Structure:
typescript
{
type: 'function',
function: {
name: 'function_name',
description: 'What it does',
parameters: { /* JSON Schema */ }
}
}tool_choice (string | object, default: 'auto')
- Control when tools are called
- : Model decides (default)
'auto' - : Never call tools
'none' - : Must call a tool
'required' - : Force specific tool
{ type: 'function', function: { name: 'specific_tool' } }
parallel_tool_calls (boolean, default: true)
- Allow multiple tools simultaneously
- Set for sequential execution
false
When to use tools:
- Need to query external APIs (weather, search, database)
- Need to perform calculations or data processing
- Building agentic systems
- Need structured data extraction
tools(Tool数组,默认:[])
- 可供模型调用的可用函数
- 结构:
typescript
{
type: 'function',
function: {
name: 'function_name',
description: 'What it does',
parameters: { /* JSON Schema */ }
}
}tool_choice(字符串 | 对象,默认:'auto')
- 控制工具调用时机
- :模型自主决定(默认)
'auto' - :从不调用工具
'none' - :必须调用工具
'required' - :强制调用特定工具
{ type: 'function', function: { name: 'specific_tool' } }
parallel_tool_calls(布尔值,默认:true)
- 允许同时调用多个工具
- 设置为以实现顺序执行
false
何时使用工具:
- 需要查询外部API(天气、搜索、数据库)
- 需要执行计算或数据处理
- 构建Agent系统
- 需要提取结构化数据
Structured Output Parameters
结构化输出参数
response_format (object, optional)
- Enforce specific output format
JSON object mode:
typescript
{ type: 'json_object' }- Model returns valid JSON
- Must also instruct model in system message
JSON Schema mode (strict):
typescript
{
type: 'json_schema',
json_schema: {
name: 'schema_name',
strict: true,
schema: { /* JSON Schema */ }
}
}- Model returns JSON matching exact schema
- Use when structure is critical (APIs, data processing)
When to use structured outputs:
- Need predictable response format
- Integrating with systems (APIs, databases)
- Data extraction
- Form filling
response_format(对象,可选)
- 强制生成特定格式的输出
JSON对象模式:
typescript
{ type: 'json_object' }- 模型返回有效的JSON
- 必须同时在系统消息中指示模型
JSON Schema模式(严格):
typescript
{
type: 'json_schema',
json_schema: {
name: 'schema_name',
strict: true,
schema: { /* JSON Schema */ }
}
}- 模型返回符合精确Schema的JSON
- 当结构至关重要时使用(API、数据处理)
何时使用结构化输出:
- 需要可预测的响应格式
- 与系统集成(API、数据库)
- 数据提取
- 表单填充
Web Search Parameters
网页搜索参数
Enable via model variant (simplest):
typescript
{ model: 'anthropic/claude-3.5-sonnet:online' }Enable via plugin:
typescript
{
plugins: [{
id: 'web',
enabled: true,
max_results: 5
}]
}When to use web search:
- Need current information (news, prices, events)
- User asks about recent developments
- Need factual verification
- Topic requires real-time data
通过模型变体启用(最简单):
typescript
{ model: 'anthropic/claude-3.5-sonnet:online' }通过插件启用:
typescript
{
plugins: [{
id: 'web',
enabled: true,
max_results: 5
}]
}何时使用网页搜索:
- 需要当前信息(新闻、价格、事件)
- 用户询问近期动态
- 需要事实验证
- 主题需要实时数据
Other Important Parameters
其他重要参数
user (string, optional)
- Stable identifier for end-user
- Set when you have user IDs
- Helps with abuse detection and caching
session_id (string, optional)
- Group related requests
- Set for conversation tracking
- Improves caching and observability
metadata (Record<string, string>, optional)
- Custom metadata (max 16 key-value pairs)
- Use for analytics and tracking
- Keys: max 64 chars, Values: max 512 chars
stop (string | string[], optional)
- Stop sequences to halt generation
- Common:
['\n\n', '###', 'END']
user(字符串,可选)
- 终端用户的稳定标识符
- 当你有用户ID时建议设置
- 有助于滥用检测与缓存
session_id(字符串,可选)
- 分组相关请求
- 为对话追踪建议设置
- 提升缓存效果与可观测性
metadata(Record<string, string>,可选)
- 自定义元数据(最多16个键值对)
- 用于分析与追踪
- 键:最多64字符,值:最多512字符
stop(字符串 | 字符串数组,可选)
- 停止序列,用于终止生成
- 常见值:
['\n\n', '###', 'END']
Handling Responses
响应处理
Non-Streaming Responses
非流式响应
Extract content:
typescript
const response = await fetch(/* ... */);
const data = await response.json();
const content = data.choices[0].message.content;
const finishReason = data.choices[0].finish_reason;
const usage = data.usage;Check for tool calls:
typescript
const toolCalls = data.choices[0].message.tool_calls;
if (toolCalls) {
// Model wants to call tools
for (const toolCall of toolCalls) {
const { name, arguments: args } = toolCall.function;
const parsedArgs = JSON.parse(args);
// Execute tool...
}
}提取内容:
typescript
const response = await fetch(/* ... */);
const data = await response.json();
const content = data.choices[0].message.content;
const finishReason = data.choices[0].finish_reason;
const usage = data.usage;检查工具调用:
typescript
const toolCalls = data.choices[0].message.tool_calls;
if (toolCalls) {
// 模型想要调用工具
for (const toolCall of toolCalls) {
const { name, arguments: args } = toolCall.function;
const parsedArgs = JSON.parse(args);
// 执行工具...
}
}Streaming Responses
流式响应
Process SSE stream:
typescript
let fullContent = '';
const response = await fetch(/* ... */);
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.startsWith('data: '));
for (const line of lines) {
const data = line.slice(6);
if (data === '[DONE]') break;
const parsed = JSON.parse(data);
const content = parsed.choices?.[0]?.delta?.content;
if (content) {
fullContent += content;
// Process incrementally...
}
// Handle usage in final chunk
if (parsed.usage) {
console.log('Usage:', parsed.usage);
}
}
}Handle streaming tool calls:
typescript
// Tool calls stream across multiple chunks
let currentToolCall = null;
let toolArgs = '';
for (const parsed of chunks) {
const toolCallChunk = parsed.choices?.[0]?.delta?.tool_calls?.[0];
if (toolCallChunk?.function?.name) {
currentToolCall = { id: toolCallChunk.id, ...toolCallChunk.function };
}
if (toolCallChunk?.function?.arguments) {
toolArgs += toolCallChunk.function.arguments;
}
if (parsed.choices?.[0]?.finish_reason === 'tool_calls' && currentToolCall) {
// Complete tool call
currentToolCall.arguments = toolArgs;
// Execute tool...
}
}处理SSE流:
typescript
let fullContent = '';
const response = await fetch(/* ... */);
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.startsWith('data: '));
for (const line of lines) {
const data = line.slice(6);
if (data === '[DONE]') break;
const parsed = JSON.parse(data);
const content = parsed.choices?.[0]?.delta?.content;
if (content) {
fullContent += content;
// 逐步处理...
}
// 在最终块中处理使用情况
if (parsed.usage) {
console.log('Usage:', parsed.usage);
}
}
}处理流式工具调用:
typescript
// 工具调用跨多个块流式传输
let currentToolCall = null;
let toolArgs = '';
for (const parsed of chunks) {
const toolCallChunk = parsed.choices?.[0]?.delta?.tool_calls?.[0];
if (toolCallChunk?.function?.name) {
currentToolCall = { id: toolCallChunk.id, ...toolCallChunk.function };
}
if (toolCallChunk?.function?.arguments) {
toolArgs += toolCallChunk.function.arguments;
}
if (parsed.choices?.[0]?.finish_reason === 'tool_calls' && currentToolCall) {
// 完整的工具调用
currentToolCall.arguments = toolArgs;
// 执行工具...
}
}Usage and Cost Tracking
使用情况与成本追踪
typescript
const { usage } = data;
console.log(`Prompt: ${usage.prompt_tokens}`);
console.log(`Completion: ${usage.completion_tokens}`);
console.log(`Total: ${usage.total_tokens}`);
// Cost (if available)
if (usage.cost) {
console.log(`Cost: $${usage.cost.toFixed(6)}`);
}
// Detailed breakdown
console.log(usage.prompt_tokens_details);
console.log(usage.completion_tokens_details);typescript
const { usage } = data;
console.log(`提示词Token数: ${usage.prompt_tokens}`);
console.log(`补全Token数: ${usage.completion_tokens}`);
console.log(`总Token数: ${usage.total_tokens}`);
// 成本(如果可用)
if (usage.cost) {
console.log(`成本: $${usage.cost.toFixed(6)}`);
}
// 详细 breakdown
console.log(usage.prompt_tokens_details);
console.log(usage.completion_tokens_details);Error Handling
错误处理
Common HTTP Status Codes
常见HTTP状态码
400 Bad Request
- Invalid request format
- Missing required fields
- Parameter out of range
- Fix: Validate request structure and parameters
401 Unauthorized
- Missing or invalid API key
- Fix: Check API key format and permissions
403 Forbidden
- Insufficient permissions
- Model not allowed
- Fix: Check guardrails, model access, API key permissions
402 Payment Required
- Insufficient credits
- Fix: Add credits to account
408 Request Timeout
- Request took too long
- Fix: Reduce prompt length, use streaming, try simpler model
429 Rate Limited
- Too many requests
- Fix: Implement exponential backoff, reduce request rate
502 Bad Gateway
- Provider error
- Fix: Use model fallbacks, retry with different model
503 Service Unavailable
- Service overloaded
- Fix: Retry with backoff, use fallbacks
400 Bad Request
- 请求格式无效
- 缺少必填字段
- 参数超出范围
- 修复方案:验证请求结构与参数
401 Unauthorized
- 缺少或无效的API密钥
- 修复方案:检查API密钥格式与权限
403 Forbidden
- 权限不足
- 模型不允许访问
- 修复方案:检查防护规则、模型访问权限、API密钥权限
402 Payment Required
- 余额不足
- 修复方案:为账户充值
408 Request Timeout
- 请求耗时过长
- 修复方案:缩短提示词长度、使用流式响应、尝试更简单的模型
429 Rate Limited
- 请求过于频繁
- 修复方案:实现指数退避、降低请求速率
502 Bad Gateway
- 供应商错误
- 修复方案:使用模型降级方案、重试其他模型
503 Service Unavailable
- 服务过载
- 修复方案:退避后重试、使用降级方案
Retry Strategy
重试策略
Exponential backoff:
typescript
async function requestWithRetry(url, body, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const response = await fetch(url, body);
if (response.ok) {
return await response.json();
}
// Retry on rate limit or server errors
if (response.status === 429 || response.status >= 500) {
const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
await new Promise(resolve => setTimeout(resolve, delay));
continue;
}
// Don't retry other errors
return response;
} catch (error) {
if (attempt === maxRetries - 1) throw error;
const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
await new Promise(resolve => setTimeout(resolve, delay));
}
}
}Retryable status codes: 408, 429, 502, 503
Do not retry: 400, 401, 403, 402
指数退避:
typescript
async function requestWithRetry(url, body, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const response = await fetch(url, body);
if (response.ok) {
return await response.json();
}
// 速率限制或服务器错误时重试
if (response.status === 429 || response.status >= 500) {
const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
await new Promise(resolve => setTimeout(resolve, delay));
continue;
}
// 其他错误不重试
return response;
} catch (error) {
if (attempt === maxRetries - 1) throw error;
const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
await new Promise(resolve => setTimeout(resolve, delay));
}
}
}可重试的状态码:408、429、502、503
不可重试:400、401、403、402
Graceful Degradation
优雅降级
Use model fallbacks:
typescript
{
models: [
'anthropic/claude-3.5-sonnet', // Primary
'openai/gpt-4o', // Fallback 1
'google/gemini-2.0-flash' // Fallback 2
]
}Handle partial failures:
- Log errors but continue
- Fall back to simpler features
- Use cached responses when available
- Provide degraded experience rather than failing completely
使用模型降级方案:
typescript
{
models: [
'anthropic/claude-3.5-sonnet', // 主模型
'openai/gpt-4o', // 降级模型1
'google/gemini-2.0-flash' // 降级模型2
]
}处理部分故障:
- 记录错误但继续执行
- 降级到更简单的功能
- 可用时使用缓存响应
- 提供降级体验而非完全失败
Advanced Features
高级功能
When to Use Tool Calling
何时使用工具调用
Good use cases:
- Querying external APIs (weather, stock prices, databases)
- Performing calculations or data processing
- Extracting structured data from unstructured text
- Building agentic systems with multiple steps
- When decisions require external information
Implementation pattern:
- Define tools with clear descriptions and parameters
- Send request with array
tools - Check if present in response
tool_calls - Execute tools with parsed arguments
- Send tool results back in a new request
- Repeat until model provides final answer
See: for complete agentic loop implementation
references/ADVANCED_PATTERNS.md适用场景:
- 查询外部API(天气、股票价格、数据库)
- 执行计算或数据处理
- 从非结构化文本中提取结构化数据
- 构建多步骤的Agent系统
- 决策需要外部信息时
实现模式:
- 定义具有清晰描述和参数的工具
- 发送包含数组的请求
tools - 检查响应中是否存在
tool_calls - 使用解析后的参数执行工具
- 在新请求中发送工具结果
- 重复直到模型提供最终答案
参考: 以获取完整的Agent循环实现
references/ADVANCED_PATTERNS.mdWhen to Use Structured Outputs
何时使用结构化输出
Good use cases:
- API responses (need specific schema)
- Data extraction (forms, documents)
- Configuration files (JSON, YAML)
- Database operations (structured queries)
- When downstream processing requires specific format
Implementation pattern:
- Define JSON Schema for desired output
- Set
response_format: { type: 'json_schema', json_schema: { ... } } - Instruct model to produce JSON (system or user message)
- Validate response against schema
- Handle parsing errors gracefully
Add response healing for robustness:
typescript
{
response_format: { /* ... */ },
plugins: [{ id: 'response-healing' }]
}适用场景:
- API响应(需要特定Schema)
- 数据提取(表单、文档)
- 配置文件(JSON、YAML)
- 数据库操作(结构化查询)
- 下游处理需要特定格式时
实现模式:
- 定义所需输出的JSON Schema
- 设置
response_format: { type: 'json_schema', json_schema: { ... } } - 指示模型生成JSON(系统或用户消息)
- 验证响应是否符合Schema
- 优雅处理解析错误
添加响应修复以提升鲁棒性:
typescript
{
response_format: { /* ... */ },
plugins: [{ id: 'response-healing' }]
}When to Use Web Search
何时使用网页搜索
Good use cases:
- User asks about recent events, news, or current data
- Need verification of facts
- Questions with time-sensitive information
- Topic requires up-to-date information
- User explicitly requests current information
Simple implementation (variant):
typescript
{
model: 'anthropic/claude-3.5-sonnet:online'
}Advanced implementation (plugin):
typescript
{
model: 'openrouter.ai/auto',
plugins: [{
id: 'web',
enabled: true,
max_results: 5,
engine: 'exa' // or 'native'
}]
}适用场景:
- 用户询问近期事件、新闻或当前数据
- 需要验证事实
- 问题包含时间敏感信息
- 主题需要最新信息
- 用户明确请求当前信息
简单实现(变体):
typescript
{
model: 'anthropic/claude-3.5-sonnet:online'
}高级实现(插件):
typescript
{
model: 'openrouter.ai/auto',
plugins: [{
id: 'web',
enabled: true,
max_results: 5,
engine: 'exa' // 或 'native'
}]
}When to Use Multimodal Inputs
何时使用多模态输入
Images (vision):
- OCR, image understanding, visual analysis
- Models: ,
openai/gpt-4o,anthropic/claude-3.5-sonnetgoogle/gemini-2.5-pro
Audio:
- Speech-to-text, audio analysis
- Models with audio support
Video:
- Video understanding, frame analysis
- Models with video support
PDFs:
- Document parsing, content extraction
- Requires plugin
file-parser
Implementation: See for multimodal patterns
references/ADVANCED_PATTERNS.md图片(视觉):
- OCR、图片理解、视觉分析
- 支持模型:、
openai/gpt-4o、anthropic/claude-3.5-sonnetgoogle/gemini-2.5-pro
音频:
- 语音转文本、音频分析
- 支持音频的模型
视频:
- 视频理解、帧分析
- 支持视频的模型
PDF:
- 文档解析、内容提取
- 需要插件
file-parser
实现:参考中的多模态模式
references/ADVANCED_PATTERNS.mdBest Practices for AI
AI最佳实践
Default Model Selection
默认模型选择
Start with: or
anthropic/claude-3.5-sonnetopenai/gpt-4o- Good balance of quality, speed, cost
- Strong at most tasks
- Wide compatibility
Switch based on needs:
- Need speed → or
openai/gpt-4o-mini:nitrogoogle/gemini-2.0-flash - Complex reasoning →
anthropic/claude-opus-4:thinking - Need web search → variant
:online - Large context → variant
:extended - Cost-sensitive → variant
:free
首选: 或
anthropic/claude-3.5-sonnetopenai/gpt-4o- 质量、速度、成本平衡
- 擅长大多数任务
- 兼容性广泛
根据需求切换:
- 需要速度 → 或
openai/gpt-4o-mini:nitrogoogle/gemini-2.0-flash - 复杂推理 →
anthropic/claude-opus-4:thinking - 需要网页搜索 → 变体
:online - 大上下文需求 → 变体
:extended - 成本敏感 → 变体
:free
Default Parameters
默认参数
typescript
{
model: 'anthropic/claude-3.5-sonnet',
messages: [...],
temperature: 0.6, // Balanced creativity
max_tokens: 1000, // Reasonable length
top_p: 0.95 // Common for quality
}Adjust based on task:
- Code:
temperature: 0.2 - Creative:
temperature: 1.0 - Factual:
temperature: 0.0-0.3
typescript
{
model: 'anthropic/claude-3.5-sonnet',
messages: [...],
temperature: 0.6, // 平衡的创造性
max_tokens: 1000, // 合理的长度
top_p: 0.95 // 常用的高质量设置
}根据任务调整:
- 代码:
temperature: 0.2 - 创意写作:
temperature: 1.0 - 事实性响应:
temperature: 0.0-0.3
When to Prefer Streaming
何时优先使用流式响应
Always prefer streaming when:
- User-facing (chat, interactive tools)
- Response length unknown
- Want progressive feedback
- Latency matters
Use non-streaming when:
- Batch processing
- Need complete response before acting
- Building API endpoints
- Very short responses (< 50 tokens)
以下场景建议始终使用流式响应:
- 面向用户(聊天、交互式工具)
- 响应长度未知
- 需要逐步反馈
- 延迟敏感
使用非流式响应的场景:
- 批量处理
- 需要完整响应后再执行操作
- 构建API端点
- 非常短的响应(<50 Token)
When to Enable Specific Features
何时启用特定功能
Tools: Enable when you need external data or actions
Structured outputs: Enable when response format matters
Web search: Enable when current information needed
Streaming: Enable for user-facing, real-time responses
Model fallbacks: Enable when reliability critical
Provider routing: Enable when you have preferences or constraints
工具:当需要外部数据或操作时启用
结构化输出:当响应格式重要时启用
网页搜索:当需要当前信息时启用
流式响应:面向用户的实时响应场景启用
模型降级:可靠性要求高时启用
供应商路由:有偏好或约束时启用
Cost Optimization Patterns
成本优化模式
Use free models for:
- Testing and prototyping
- Low-complexity tasks
- High-volume, low-value operations
Use routing to optimize:
typescript
{
provider: {
order: ['openai', 'anthropic'],
sort: 'price', // Optimize for cost
allow_fallbacks: true
}
}Set max_tokens to prevent runaway responses
Use caching via and parameters
Enable prompt caching when supported
usersession_id免费模型适用场景:
- 测试与原型开发
- 低复杂度任务
- 高容量、低价值操作
使用路由优化成本:
typescript
{
provider: {
order: ['openai', 'anthropic'],
sort: 'price', // 针对成本优化
allow_fallbacks: true
}
}设置max_tokens以避免无限制响应
通过和参数启用缓存
支持时启用提示词缓存
usersession_idPerformance Optimization
性能优化
Reduce latency:
- Use variants for speed
:nitro - Use streaming for perceived speed
- Set ID for caching benefits
user - Choose faster models (mini, flash) when quality allows
Increase throughput:
- Use provider routing with
sort: 'throughput' - Parallelize independent requests
- Use streaming to reduce wait time
Optimize for specific metrics:
typescript
{
provider: {
sort: 'latency' // or 'price' or 'throughput'
}
}降低延迟:
- 使用变体提升速度
:nitro - 使用流式响应提升感知速度
- 设置ID以获得缓存收益
user - 质量允许时选择更快的模型(mini、flash)
提升吞吐量:
- 使用供应商路由并设置
sort: 'throughput' - 并行处理独立请求
- 使用流式响应减少等待时间
针对特定指标优化:
typescript
{
provider: {
sort: 'latency' // 或 'price' 或 'throughput'
}
}Progressive Disclosure
进阶参考
For detailed reference information, consult:
如需详细参考信息,请查阅:
Parameters Reference
参数参考
File:
references/PARAMETERS.md- Complete parameter reference (50+ parameters)
- Types, ranges, defaults
- Parameter support by model
- Usage examples
文件:
references/PARAMETERS.md- 完整的参数参考(50+参数)
- 类型、范围、默认值
- 各模型支持的参数
- 使用示例
Error Codes Reference
错误码参考
File:
references/ERROR_CODES.md- All HTTP status codes
- Error response structure
- Error metadata types
- Native finish reasons
- Retry strategies
文件:
references/ERROR_CODES.md- 所有HTTP状态码
- 错误响应结构
- 错误元数据类型
- 原生停止原因
- 重试策略
Model Selection Guide
模型选择指南
File:
references/MODEL_SELECTION.md- Model families and capabilities
- Model variants explained
- Selection criteria by use case
- Model capability matrix
- Provider routing preferences
文件:
references/MODEL_SELECTION.md- 模型家族与能力
- 模型变体说明
- 按使用场景选择的标准
- 模型能力矩阵
- 供应商路由偏好
Routing Strategies
路由策略
File:
references/ROUTING_STRATEGIES.md- Model fallbacks configuration
- Provider selection patterns
- Auto router setup
- Routing by use case (cost, latency, quality)
文件:
references/ROUTING_STRATEGIES.md- 模型降级配置
- 供应商选择模式
- 自动路由设置
- 按场景路由(成本、延迟、质量)
Advanced Patterns
高级模式
File:
references/ADVANCED_PATTERNS.md- Tool calling with agentic loops
- Structured outputs implementation
- Web search integration
- Multimodal handling
- Streaming patterns
- Framework integrations
文件:
references/ADVANCED_PATTERNS.md- 带Agent循环的工具调用
- 结构化输出实现
- 网页搜索集成
- 多模态处理
- 流式响应模式
- 框架集成
Working Examples
实用示例
File:
references/EXAMPLES.md- TypeScript patterns for common tasks
- Python examples
- cURL examples
- Advanced patterns
- Framework integration examples
文件:
references/EXAMPLES.md- 常见任务的TypeScript模式
- Python示例
- cURL示例
- 高级模式
- 框架集成示例
Ready-to-Use Templates
即用型模板
Directory:
templates/- - Minimal working request
basic-request.ts - - SSE streaming with cancellation
streaming-request.ts - - Complete agentic loop with tools
tool-calling.ts - - JSON Schema enforcement
structured-output.ts - - Robust retry logic
error-handling.ts
目录:
templates/- - 最简可用请求
basic-request.ts - - 带取消功能的SSE流式响应
streaming-request.ts - - 完整的Agent循环与工具调用
tool-calling.ts - - JSON Schema强制
structured-output.ts - - 鲁棒的重试逻辑
error-handling.ts
Quick Reference
快速参考
Minimal Request
最简请求
typescript
{
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: 'Your prompt' }]
}typescript
{
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: 'Your prompt' }]
}With Streaming
流式请求
typescript
{
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: '...' }],
stream: true
}typescript
{
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: '...' }],
stream: true
}With Tools
带工具调用的请求
typescript
{
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: '...' }],
tools: [{ type: 'function', function: { name, description, parameters } }],
tool_choice: 'auto'
}typescript
{
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: '...' }],
tools: [{ type: 'function', function: { name, description, parameters } }],
tool_choice: 'auto'
}With Structured Output
带结构化输出的请求
typescript
{
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'system', content: 'Output JSON only...' }],
response_format: { type: 'json_object' }
}typescript
{
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'system', content: '仅输出JSON...' }],
response_format: { type: 'json_object' }
}With Web Search
带网页搜索的请求
typescript
{
model: 'anthropic/claude-3.5-sonnet:online',
messages: [{ role: 'user', content: '...' }]
}typescript
{
model: 'anthropic/claude-3.5-sonnet:online',
messages: [{ role: 'user', content: '...' }]
}With Model Fallbacks
带模型降级的请求
typescript
{
models: ['anthropic/claude-3.5-sonnet', 'openai/gpt-4o'],
messages: [{ role: 'user', content: '...' }]
}Remember: OpenRouter is OpenAI-compatible. Use the OpenAI SDK with for a familiar experience.
baseURL: 'https://openrouter.ai/api/v1'typescript
{
models: ['anthropic/claude-3.5-sonnet', 'openai/gpt-4o'],
messages: [{ role: 'user', content: '...' }]
}注意:OpenRouter兼容OpenAI。你可以使用OpenAI SDK并设置,获得熟悉的使用体验。
baseURL: 'https://openrouter.ai/api/v1'