openrouter

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

OpenRouter API for AI Agents

面向AI Agent的OpenRouter API

Expert guidance for AI agents integrating with OpenRouter API - unified access to 400+ models from 90+ providers.

When to use this skill:

Making chat completions via OpenRouter API
Selecting appropriate models and variants
Implementing streaming responses
Using tool/function calling
Enforcing structured outputs
Integrating web search
Handling multimodal inputs (images, audio, video, PDFs)
Managing model routing and fallbacks
Handling errors and retries
Optimizing cost and performance

为集成OpenRouter API的AI Agent提供专业指导——可统一访问来自90+供应商的400+模型。

何时使用本技能：

通过OpenRouter API实现对话补全
选择合适的模型及变体
实现流式响应
使用工具/函数调用
强制生成结构化输出
集成网页搜索
处理多模态输入（图片、音频、视频、PDF）
管理模型路由与降级方案
处理错误与重试
优化成本与性能

API Basics

API基础

Making a Request

发起请求

Endpoint:

POST https://openrouter.ai/api/v1/chat/completions

Headers (required):

typescript

{
  'Authorization': `Bearer ${apiKey}`,
  'Content-Type': 'application/json',
  // Optional: for app attribution
  'HTTP-Referer': 'https://your-app.com',
  'X-Title': 'Your App Name'
}

Minimal request structure:

typescript

const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${apiKey}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [
      { role: 'user', content: 'Your prompt here' }
    ]
  })
});

Endpoint：

POST https://openrouter.ai/api/v1/chat/completions

请求头（必填）：

typescript

{
  'Authorization': `Bearer ${apiKey}`,
  'Content-Type': 'application/json',
  // 可选：用于应用归因
  'HTTP-Referer': 'https://your-app.com',
  'X-Title': 'Your App Name'
}

最简请求结构：

typescript

const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${apiKey}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [
      { role: 'user', content: 'Your prompt here' }
    ]
  })
});

Response Structure

响应结构

Non-streaming response:

json

{
  "id": "gen-abc123",
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Response text here"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  },
  "model": "anthropic/claude-3.5-sonnet"
}

Key fields:

```
choices[0].message.content
```
- The assistant's response
```
choices[0].finish_reason
```
- Why generation stopped (stop, length, tool_calls, etc.)
```
usage
```
- Token counts and cost information
```
model
```
- Actual model used (may differ from requested)

非流式响应：

json

{
  "id": "gen-abc123",
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Response text here"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  },
  "model": "anthropic/claude-3.5-sonnet"
}

关键字段：

```
choices[0].message.content
```
- 助手的响应内容
```
choices[0].finish_reason
```
- 生成停止的原因（stop、length、tool_calls等）
```
usage
```
- Token计数与成本信息
```
model
```
- 实际使用的模型（可能与请求的模型不同）

When to Use Streaming vs Non-Streaming

流式与非流式响应的适用场景

Use streaming (
stream: true
) when:

Real-time responses needed (chat interfaces, interactive tools)
Latency matters (user-facing applications)
Large responses expected (long-form content)
Want to show progressive output

Use non-streaming when:

Processing in background (batch jobs, async tasks)
Need complete response before processing
Building to an API/endpoint
Response is short (few tokens)

Streaming basics:

typescript

const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: { /* ... */ },
  body: JSON.stringify({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [{ role: 'user', content: '...' }],
    stream: true
  })
});

for await (const chunk of response.body) {
  const text = new TextDecoder().decode(chunk);
  const lines = text.split('\n').filter(line => line.startsWith('data: '));

  for (const line of lines) {
    const data = line.slice(6); // Remove 'data: '
    if (data === '[DONE]') break;

    const parsed = JSON.parse(data);
    const content = parsed.choices?.[0]?.delta?.content;
    if (content) {
      // Accumulate or display content
    }
  }
}

**使用流式响应（

stream: true

）**的场景：

需要实时响应（聊天界面、交互式工具）
延迟敏感（面向用户的应用）
预期会生成大篇幅响应（长文本内容）
需要逐步展示输出内容

使用非流式响应的场景：

后台处理（批量任务、异步任务）
需要完整响应后再进行处理
构建API端点
响应内容简短（仅少量Token）

流式响应基础示例：

typescript

const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: { /* ... */ },
  body: JSON.stringify({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [{ role: 'user', content: '...' }],
    stream: true
  })
});

for await (const chunk of response.body) {
  const text = new TextDecoder().decode(chunk);
  const lines = text.split('\n').filter(line => line.startsWith('data: '));

  for (const line of lines) {
    const data = line.slice(6); // 移除 'data: '
    if (data === '[DONE]') break;

    const parsed = JSON.parse(data);
    const content = parsed.choices?.[0]?.delta?.content;
    if (content) {
      // 累积或展示内容
    }
  }
}

Model Selection

模型选择

Model Identifier Format

模型标识符格式

Format:

provider/model-name[:variant]

Examples:

```
anthropic/claude-3.5-sonnet
```
- Specific model
```
openai/gpt-4o:online
```
- With web search enabled
```
google/gemini-2.0-flash:free
```
- Free tier variant

格式：

provider/model-name[:variant]

示例：

```
anthropic/claude-3.5-sonnet
```
- 特定模型
```
openai/gpt-4o:online
```
- 启用网页搜索的变体
```
google/gemini-2.0-flash:free
```
- 免费层级变体

Model Variants and When to Use Them

模型变体及适用场景

Variant	Use When	Tradeoffs
`:free`	Cost is primary concern, testing, prototyping	Rate limits, lower quality models
`:online`	Need current information, real-time data	Higher cost, web search latency
`:extended`	Large context window needed	May be slower, higher cost
`:thinking`	Complex reasoning, multi-step problems	Higher token usage, slower
`:nitro`	Speed is critical	May have quality tradeoffs
`:exacto`	Need specific provider	No fallbacks, may be less available

变体	使用场景	权衡
`:free`	成本为首要考虑因素、测试、原型开发	有调用速率限制，模型质量较低
`:online`	需要当前信息、实时数据	成本更高，网页搜索存在延迟
`:extended`	需要大上下文窗口	可能更慢，成本更高
`:thinking`	复杂推理、多步骤问题	Token消耗更高，速度更慢
`:nitro`	速度至关重要	可能存在质量权衡
`:exacto`	需要特定供应商的模型	无降级方案，可用性可能较低

Default Model Choices by Task

按任务选择默认模型

General purpose:

anthropic/claude-3.5-sonnet

openai/gpt-4o

Balanced quality, speed, cost
Good for most tasks

Coding:

anthropic/claude-3.5-sonnet

openai/gpt-4o

Strong code generation and understanding
Good reasoning

Complex reasoning:

anthropic/claude-opus-4:thinking

openai/o3

Deep reasoning capabilities
Higher cost, slower

Fast responses:

openai/gpt-4o-mini:nitro

google/gemini-2.0-flash

Minimal latency
Good for real-time applications

Cost-sensitive:

google/gemini-2.0-flash:free

meta-llama/llama-3.1-70b:free

No cost with limits
Good for high-volume, lower-complexity tasks

Current information:

anthropic/claude-3.5-sonnet:online

google/gemini-2.5-pro:online

Web search built-in
Real-time data

Large context:

anthropic/claude-3.5-sonnet:extended

google/gemini-2.5-pro:extended

200K+ context windows
Document analysis, codebase understanding

通用场景：

anthropic/claude-3.5-sonnet

或

openai/gpt-4o

质量、速度、成本平衡
适用于大多数任务

代码开发：

anthropic/claude-3.5-sonnet

或

openai/gpt-4o

强大的代码生成与理解能力
推理能力出色

复杂推理：

anthropic/claude-opus-4:thinking

或

openai/o3

深度推理能力
成本更高，速度更慢

快速响应：

openai/gpt-4o-mini:nitro

或

google/gemini-2.0-flash

延迟极低
适用于实时应用

成本敏感场景：

google/gemini-2.0-flash:free

或

meta-llama/llama-3.1-70b:free

免费使用但有额度限制
适用于高容量、低复杂度任务

当前信息获取：

anthropic/claude-3.5-sonnet:online

或

google/gemini-2.5-pro:online

内置网页搜索
可获取实时数据

大上下文需求：

anthropic/claude-3.5-sonnet:extended

或

google/gemini-2.5-pro:extended

支持200K+上下文窗口
适用于文档分析、代码库理解

Provider Routing Preferences

供应商路由偏好

Default behavior: OpenRouter automatically selects best provider

Explicit provider order:

typescript

{
  provider: {
    order: ['anthropic', 'openai', 'google'],
    allow_fallbacks: true,
    sort: 'price' // 'price', 'latency', or 'throughput'
  }
}

When to set provider order:

Have preferred provider arrangements
Need to optimize for specific metric (cost, speed)
Want to exclude certain providers
Have BYOK (Bring Your Own Key) for specific providers

默认行为：OpenRouter自动选择最优供应商

显式指定供应商顺序：

typescript

{
  provider: {
    order: ['anthropic', 'openai', 'google'],
    allow_fallbacks: true,
    sort: 'price' // 'price'、'latency' 或 'throughput'
  }
}

何时设置供应商顺序：

有偏好的供应商合作安排
需要针对特定指标优化（成本、速度）
想要排除某些供应商
为特定供应商使用自有密钥（BYOK）

Model Fallbacks

模型降级方案

Automatic fallback - try multiple models in order:

typescript

{
  models: [
    'anthropic/claude-3.5-sonnet',
    'openai/gpt-4o',
    'google/gemini-2.0-flash'
  ]
}

When to use fallbacks:

High reliability required
Multiple providers acceptable
Want graceful degradation
Avoid single point of failure

Fallback behavior:

Tries first model
Falls to next on error (5xx, 429, timeout)
Uses whichever succeeds
Returns which model was used in
```
model
```
field

自动降级 - 按顺序尝试多个模型：

typescript

{
  models: [
    'anthropic/claude-3.5-sonnet',
    'openai/gpt-4o',
    'google/gemini-2.0-flash'
  ]
}

何时使用降级方案：

要求高可靠性
可接受多个供应商的模型
希望实现优雅降级
避免单点故障

降级行为：

首先尝试第一个模型
遇到错误（5xx、429、超时）时切换到下一个模型
使用第一个成功的模型
在
```
model
```
字段中返回实际使用的模型

Parameters You Need

必备参数

Core Parameters

核心参数

model (string, optional)

Which model to use
Default: user's default model
Always specify for consistency

messages (Message[], required)

Conversation history

Structure:

{ role: 'user'|'assistant'|'system', content: string | ContentPart[] }

For multimodal: content can be array of text and image_url parts

stream (boolean, default: false)

Enable Server-Sent Events streaming
Use for real-time responses

temperature (float, 0.0-2.0, default: 1.0)

Controls randomness
0.0-0.3: Deterministic, factual responses (code, precise answers)
0.4-0.7: Balanced (general use)
0.8-1.2: Creative (brainstorming, creative writing)
1.3-2.0: Highly creative, unpredictable (experimental)

max_tokens (integer, optional)

Maximum tokens to generate
Always set to control cost and prevent runaway responses
Typical: 100-500 for short, 1000-2000 for long responses
Model limit: context_length - prompt_length

top_p (float, 0.0-1.0, default: 1.0)

Nucleus sampling - limits to top probability mass
Use instead of temperature when you want predictable diversity
0.9-0.95: Common settings for quality

top_k (integer, 0+, default: 0/disabled)

Limit to K most likely tokens
1: Always most likely (deterministic)
40-50: Balanced
Not available for OpenAI models

model（字符串，可选）

使用的模型
默认值：用户的默认模型
为保持一致性，建议始终指定

messages（Message数组，必填）

对话历史

结构：

{ role: 'user'|'assistant'|'system', content: string | ContentPart[] }

多模态场景：content可以是文本和image_url部分的数组

stream（布尔值，默认：false）

启用Server-Sent Events流式响应
用于实时响应场景

temperature（浮点数，0.0-2.0，默认：1.0）

控制生成的随机性
0.0-0.3：确定性、事实性响应（代码、精准答案）
0.4-0.7：平衡型（通用场景）
0.8-1.2：创意型（头脑风暴、创意写作）
1.3-2.0：高度创意、不可预测（实验性场景）

max_tokens（整数，可选）

生成的最大Token数
建议始终设置以控制成本并避免无限制生成
典型值：短响应100-500，长响应1000-2000
模型限制：context_length - prompt_length

top_p（浮点数，0.0-1.0，默认：1.0）

核采样 - 限制为Top概率质量的Token
当你想要可预测的多样性时，使用此参数替代temperature
0.9-0.95：常用的高质量设置

top_k（整数，0+，默认：0/禁用）

限制为K个最可能的Token
1：始终选择最可能的Token（确定性）
40-50：平衡型
OpenAI模型不支持此参数

Sampling Strategy Guidelines

采样策略指南

For code generation:

temperature: 0.1-0.3, top_p: 0.95

For factual responses:

temperature: 0.0-0.2

For creative writing:

temperature: 0.8-1.2

For brainstorming:

temperature: 1.0-1.5

For chat:

temperature: 0.6-0.8

代码生成：

temperature: 0.1-0.3, top_p: 0.95

事实性响应：

temperature: 0.0-0.2

创意写作：

temperature: 0.8-1.2

头脑风暴：

temperature: 1.0-1.5

聊天场景：

temperature: 0.6-0.8

Tool Calling Parameters

工具调用参数

tools (Tool[], default: [])

Available functions for model to call
Structure:

typescript

{
  type: 'function',
  function: {
    name: 'function_name',
    description: 'What it does',
    parameters: { /* JSON Schema */ }
  }
}

tool_choice (string | object, default: 'auto')

Control when tools are called
```
'auto'
```
: Model decides (default)
```
'none'
```
: Never call tools
```
'required'
```
: Must call a tool

{ type: 'function', function: { name: 'specific_tool' } }

: Force specific tool

parallel_tool_calls (boolean, default: true)

Allow multiple tools simultaneously
Set
```
false
```
for sequential execution

When to use tools:

Need to query external APIs (weather, search, database)
Need to perform calculations or data processing
Building agentic systems
Need structured data extraction

tools（Tool数组，默认：[]）

可供模型调用的可用函数
结构：

typescript

{
  type: 'function',
  function: {
    name: 'function_name',
    description: 'What it does',
    parameters: { /* JSON Schema */ }
  }
}

tool_choice（字符串 | 对象，默认：'auto'）

控制工具调用时机
```
'auto'
```
：模型自主决定（默认）
```
'none'
```
：从不调用工具
```
'required'
```
：必须调用工具

{ type: 'function', function: { name: 'specific_tool' } }

：强制调用特定工具

parallel_tool_calls（布尔值，默认：true）

允许同时调用多个工具
设置为
```
false
```
以实现顺序执行

何时使用工具：

需要查询外部API（天气、搜索、数据库）
需要执行计算或数据处理
构建Agent系统
需要提取结构化数据

Structured Output Parameters

结构化输出参数

response_format (object, optional)

Enforce specific output format

JSON object mode:

typescript

{ type: 'json_object' }

Model returns valid JSON
Must also instruct model in system message

JSON Schema mode (strict):

typescript

{
  type: 'json_schema',
  json_schema: {
    name: 'schema_name',
    strict: true,
    schema: { /* JSON Schema */ }
  }
}

Model returns JSON matching exact schema
Use when structure is critical (APIs, data processing)

When to use structured outputs:

Need predictable response format
Integrating with systems (APIs, databases)
Data extraction
Form filling

response_format（对象，可选）

强制生成特定格式的输出

JSON对象模式：

typescript

{ type: 'json_object' }

模型返回有效的JSON
必须同时在系统消息中指示模型

JSON Schema模式（严格）：

typescript

{
  type: 'json_schema',
  json_schema: {
    name: 'schema_name',
    strict: true,
    schema: { /* JSON Schema */ }
  }
}

模型返回符合精确Schema的JSON
当结构至关重要时使用（API、数据处理）

何时使用结构化输出：

需要可预测的响应格式
与系统集成（API、数据库）
数据提取
表单填充

Web Search Parameters

网页搜索参数

Enable via model variant (simplest):

typescript

{ model: 'anthropic/claude-3.5-sonnet:online' }

Enable via plugin:

typescript

{
  plugins: [{
    id: 'web',
    enabled: true,
    max_results: 5
  }]
}

When to use web search:

Need current information (news, prices, events)
User asks about recent developments
Need factual verification
Topic requires real-time data

通过模型变体启用（最简单）：

typescript

{ model: 'anthropic/claude-3.5-sonnet:online' }

通过插件启用：

typescript

{
  plugins: [{
    id: 'web',
    enabled: true,
    max_results: 5
  }]
}

何时使用网页搜索：

需要当前信息（新闻、价格、事件）
用户询问近期动态
需要事实验证
主题需要实时数据

Other Important Parameters

其他重要参数

user (string, optional)

Stable identifier for end-user
Set when you have user IDs
Helps with abuse detection and caching

session_id (string, optional)

Group related requests
Set for conversation tracking
Improves caching and observability

metadata (Record<string, string>, optional)

Custom metadata (max 16 key-value pairs)
Use for analytics and tracking
Keys: max 64 chars, Values: max 512 chars

stop (string | string[], optional)

Stop sequences to halt generation
Common:
```
['\n\n', '###', 'END']
```

user（字符串，可选）

终端用户的稳定标识符
当你有用户ID时建议设置
有助于滥用检测与缓存

session_id（字符串，可选）

分组相关请求
为对话追踪建议设置
提升缓存效果与可观测性

metadata（Record<string, string>，可选）

自定义元数据（最多16个键值对）
用于分析与追踪
键：最多64字符，值：最多512字符

stop（字符串 | 字符串数组，可选）

停止序列，用于终止生成
常见值：
```
['\n\n', '###', 'END']
```

Handling Responses

响应处理

Non-Streaming Responses

非流式响应

Extract content:

typescript

const response = await fetch(/* ... */);
const data = await response.json();

const content = data.choices[0].message.content;
const finishReason = data.choices[0].finish_reason;
const usage = data.usage;

Check for tool calls:

typescript

const toolCalls = data.choices[0].message.tool_calls;
if (toolCalls) {
  // Model wants to call tools
  for (const toolCall of toolCalls) {
    const { name, arguments: args } = toolCall.function;
    const parsedArgs = JSON.parse(args);
    // Execute tool...
  }
}

提取内容：

typescript

const response = await fetch(/* ... */);
const data = await response.json();

const content = data.choices[0].message.content;
const finishReason = data.choices[0].finish_reason;
const usage = data.usage;

检查工具调用：

typescript

const toolCalls = data.choices[0].message.tool_calls;
if (toolCalls) {
  // 模型想要调用工具
  for (const toolCall of toolCalls) {
    const { name, arguments: args } = toolCall.function;
    const parsedArgs = JSON.parse(args);
    // 执行工具...
  }
}

Streaming Responses

流式响应

Process SSE stream:

typescript

let fullContent = '';
const response = await fetch(/* ... */);

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split('\n').filter(line => line.startsWith('data: '));

  for (const line of lines) {
    const data = line.slice(6);
    if (data === '[DONE]') break;

    const parsed = JSON.parse(data);
    const content = parsed.choices?.[0]?.delta?.content;
    if (content) {
      fullContent += content;
      // Process incrementally...
    }

    // Handle usage in final chunk
    if (parsed.usage) {
      console.log('Usage:', parsed.usage);
    }
  }
}

Handle streaming tool calls:

typescript

// Tool calls stream across multiple chunks
let currentToolCall = null;
let toolArgs = '';

for (const parsed of chunks) {
  const toolCallChunk = parsed.choices?.[0]?.delta?.tool_calls?.[0];

  if (toolCallChunk?.function?.name) {
    currentToolCall = { id: toolCallChunk.id, ...toolCallChunk.function };
  }

  if (toolCallChunk?.function?.arguments) {
    toolArgs += toolCallChunk.function.arguments;
  }

  if (parsed.choices?.[0]?.finish_reason === 'tool_calls' && currentToolCall) {
    // Complete tool call
    currentToolCall.arguments = toolArgs;
    // Execute tool...
  }
}

处理SSE流：

typescript

let fullContent = '';
const response = await fetch(/* ... */);

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split('\n').filter(line => line.startsWith('data: '));

  for (const line of lines) {
    const data = line.slice(6);
    if (data === '[DONE]') break;

    const parsed = JSON.parse(data);
    const content = parsed.choices?.[0]?.delta?.content;
    if (content) {
      fullContent += content;
      // 逐步处理...
    }

    // 在最终块中处理使用情况
    if (parsed.usage) {
      console.log('Usage:', parsed.usage);
    }
  }
}

处理流式工具调用：

typescript

// 工具调用跨多个块流式传输
let currentToolCall = null;
let toolArgs = '';

for (const parsed of chunks) {
  const toolCallChunk = parsed.choices?.[0]?.delta?.tool_calls?.[0];

  if (toolCallChunk?.function?.name) {
    currentToolCall = { id: toolCallChunk.id, ...toolCallChunk.function };
  }

  if (toolCallChunk?.function?.arguments) {
    toolArgs += toolCallChunk.function.arguments;
  }

  if (parsed.choices?.[0]?.finish_reason === 'tool_calls' && currentToolCall) {
    // 完整的工具调用
    currentToolCall.arguments = toolArgs;
    // 执行工具...
  }
}

Usage and Cost Tracking

使用情况与成本追踪

typescript

const { usage } = data;
console.log(`Prompt: ${usage.prompt_tokens}`);
console.log(`Completion: ${usage.completion_tokens}`);
console.log(`Total: ${usage.total_tokens}`);

// Cost (if available)
if (usage.cost) {
  console.log(`Cost: $${usage.cost.toFixed(6)}`);
}

// Detailed breakdown
console.log(usage.prompt_tokens_details);
console.log(usage.completion_tokens_details);

typescript

const { usage } = data;
console.log(`提示词Token数: ${usage.prompt_tokens}`);
console.log(`补全Token数: ${usage.completion_tokens}`);
console.log(`总Token数: ${usage.total_tokens}`);

// 成本（如果可用）
if (usage.cost) {
  console.log(`成本: $${usage.cost.toFixed(6)}`);
}

// 详细 breakdown
console.log(usage.prompt_tokens_details);
console.log(usage.completion_tokens_details);

Error Handling

错误处理

Common HTTP Status Codes

常见HTTP状态码

400 Bad Request

Invalid request format
Missing required fields
Parameter out of range
Fix: Validate request structure and parameters

401 Unauthorized

Missing or invalid API key
Fix: Check API key format and permissions

403 Forbidden

Insufficient permissions
Model not allowed
Fix: Check guardrails, model access, API key permissions

402 Payment Required

Insufficient credits
Fix: Add credits to account

408 Request Timeout

Request took too long
Fix: Reduce prompt length, use streaming, try simpler model

429 Rate Limited

Too many requests
Fix: Implement exponential backoff, reduce request rate

502 Bad Gateway

Provider error
Fix: Use model fallbacks, retry with different model

503 Service Unavailable

Service overloaded
Fix: Retry with backoff, use fallbacks

400 Bad Request

请求格式无效
缺少必填字段
参数超出范围
修复方案：验证请求结构与参数

401 Unauthorized

缺少或无效的API密钥
修复方案：检查API密钥格式与权限

403 Forbidden

权限不足
模型不允许访问
修复方案：检查防护规则、模型访问权限、API密钥权限

402 Payment Required

余额不足
修复方案：为账户充值

408 Request Timeout

请求耗时过长
修复方案：缩短提示词长度、使用流式响应、尝试更简单的模型

429 Rate Limited

请求过于频繁
修复方案：实现指数退避、降低请求速率

502 Bad Gateway

供应商错误
修复方案：使用模型降级方案、重试其他模型

503 Service Unavailable

服务过载
修复方案：退避后重试、使用降级方案

Retry Strategy

重试策略

Exponential backoff:

typescript

async function requestWithRetry(url, body, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await fetch(url, body);

      if (response.ok) {
        return await response.json();
      }

      // Retry on rate limit or server errors
      if (response.status === 429 || response.status >= 500) {
        const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }

      // Don't retry other errors
      return response;
    } catch (error) {
      if (attempt === maxRetries - 1) throw error;
      const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
}

Retryable status codes: 408, 429, 502, 503 Do not retry: 400, 401, 403, 402

指数退避：

typescript

async function requestWithRetry(url, body, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await fetch(url, body);

      if (response.ok) {
        return await response.json();
      }

      // 速率限制或服务器错误时重试
      if (response.status === 429 || response.status >= 500) {
        const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }

      // 其他错误不重试
      return response;
    } catch (error) {
      if (attempt === maxRetries - 1) throw error;
      const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
}

可重试的状态码：408、429、502、503 不可重试：400、401、403、402

Graceful Degradation

优雅降级

Use model fallbacks:

typescript

{
  models: [
    'anthropic/claude-3.5-sonnet',  // Primary
    'openai/gpt-4o',                // Fallback 1
    'google/gemini-2.0-flash'        // Fallback 2
  ]
}

Handle partial failures:

Log errors but continue
Fall back to simpler features
Use cached responses when available
Provide degraded experience rather than failing completely

使用模型降级方案：

typescript

{
  models: [
    'anthropic/claude-3.5-sonnet',  // 主模型
    'openai/gpt-4o',                // 降级模型1
    'google/gemini-2.0-flash'        // 降级模型2
  ]
}

处理部分故障：

记录错误但继续执行
降级到更简单的功能
可用时使用缓存响应
提供降级体验而非完全失败

Advanced Features

高级功能

When to Use Tool Calling

何时使用工具调用

Good use cases:

Querying external APIs (weather, stock prices, databases)
Performing calculations or data processing
Extracting structured data from unstructured text
Building agentic systems with multiple steps
When decisions require external information

Implementation pattern:

Define tools with clear descriptions and parameters
Send request with
```
tools
```
array
Check if
```
tool_calls
```
present in response
Execute tools with parsed arguments
Send tool results back in a new request
Repeat until model provides final answer

See:

references/ADVANCED_PATTERNS.md

for complete agentic loop implementation

适用场景：

查询外部API（天气、股票价格、数据库）
执行计算或数据处理
从非结构化文本中提取结构化数据
构建多步骤的Agent系统
决策需要外部信息时

实现模式：

定义具有清晰描述和参数的工具
发送包含
```
tools
```
数组的请求
检查响应中是否存在
```
tool_calls
```
使用解析后的参数执行工具
在新请求中发送工具结果
重复直到模型提供最终答案

参考：

references/ADVANCED_PATTERNS.md

以获取完整的Agent循环实现

When to Use Structured Outputs

何时使用结构化输出

Good use cases:

API responses (need specific schema)
Data extraction (forms, documents)
Configuration files (JSON, YAML)
Database operations (structured queries)
When downstream processing requires specific format

Implementation pattern:

Define JSON Schema for desired output

Set

response_format: { type: 'json_schema', json_schema: { ... } }

Instruct model to produce JSON (system or user message)
Validate response against schema
Handle parsing errors gracefully

Add response healing for robustness:

typescript

{
  response_format: { /* ... */ },
  plugins: [{ id: 'response-healing' }]
}

适用场景：

API响应（需要特定Schema）
数据提取（表单、文档）
配置文件（JSON、YAML）
数据库操作（结构化查询）
下游处理需要特定格式时

实现模式：

定义所需输出的JSON Schema

设置

response_format: { type: 'json_schema', json_schema: { ... } }

指示模型生成JSON（系统或用户消息）
验证响应是否符合Schema
优雅处理解析错误

添加响应修复以提升鲁棒性：

typescript

{
  response_format: { /* ... */ },
  plugins: [{ id: 'response-healing' }]
}

When to Use Web Search

何时使用网页搜索

Good use cases:

User asks about recent events, news, or current data
Need verification of facts
Questions with time-sensitive information
Topic requires up-to-date information
User explicitly requests current information

Simple implementation (variant):

typescript

{
  model: 'anthropic/claude-3.5-sonnet:online'
}

Advanced implementation (plugin):

typescript

{
  model: 'openrouter.ai/auto',
  plugins: [{
    id: 'web',
    enabled: true,
    max_results: 5,
    engine: 'exa' // or 'native'
  }]
}

适用场景：

用户询问近期事件、新闻或当前数据
需要验证事实
问题包含时间敏感信息
主题需要最新信息
用户明确请求当前信息

简单实现（变体）：

typescript

{
  model: 'anthropic/claude-3.5-sonnet:online'
}

高级实现（插件）：

typescript

{
  model: 'openrouter.ai/auto',
  plugins: [{
    id: 'web',
    enabled: true,
    max_results: 5,
    engine: 'exa' // 或 'native'
  }]
}

When to Use Multimodal Inputs

何时使用多模态输入

Images (vision):

OCR, image understanding, visual analysis

Models:

openai/gpt-4o

anthropic/claude-3.5-sonnet

google/gemini-2.5-pro

Audio:

Speech-to-text, audio analysis
Models with audio support

Video:

Video understanding, frame analysis
Models with video support

PDFs:

Document parsing, content extraction
Requires
```
file-parser
```
plugin

Implementation: See

references/ADVANCED_PATTERNS.md

for multimodal patterns

图片（视觉）：

OCR、图片理解、视觉分析

支持模型：

openai/gpt-4o

、

anthropic/claude-3.5-sonnet

、

google/gemini-2.5-pro

音频：

语音转文本、音频分析
支持音频的模型

视频：

视频理解、帧分析
支持视频的模型

PDF：

文档解析、内容提取
需要
```
file-parser
```
插件

实现：参考

references/ADVANCED_PATTERNS.md

中的多模态模式

Best Practices for AI

AI最佳实践

Default Model Selection

默认模型选择

Start with:

anthropic/claude-3.5-sonnet

openai/gpt-4o

Good balance of quality, speed, cost
Strong at most tasks
Wide compatibility

Switch based on needs:

Need speed →

openai/gpt-4o-mini:nitro

google/gemini-2.0-flash

Complex reasoning →
```
anthropic/claude-opus-4:thinking
```
Need web search →
```
:online
```
variant
Large context →
```
:extended
```
variant
Cost-sensitive →
```
:free
```
variant

首选：

anthropic/claude-3.5-sonnet

或

openai/gpt-4o

质量、速度、成本平衡
擅长大多数任务
兼容性广泛

根据需求切换：

需要速度 →

openai/gpt-4o-mini:nitro

或

google/gemini-2.0-flash

复杂推理 →
```
anthropic/claude-opus-4:thinking
```
需要网页搜索 →
```
:online
```
变体
大上下文需求 →
```
:extended
```
变体
成本敏感 →
```
:free
```
变体

Default Parameters

默认参数

typescript

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [...],
  temperature: 0.6,  // Balanced creativity
  max_tokens: 1000,   // Reasonable length
  top_p: 0.95        // Common for quality
}

Adjust based on task:

Code:
```
temperature: 0.2
```
Creative:
```
temperature: 1.0
```
Factual:
```
temperature: 0.0-0.3
```

typescript

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [...],
  temperature: 0.6,  // 平衡的创造性
  max_tokens: 1000,   // 合理的长度
  top_p: 0.95        // 常用的高质量设置
}

根据任务调整：

代码：
```
temperature: 0.2
```
创意写作：
```
temperature: 1.0
```
事实性响应：
```
temperature: 0.0-0.3
```

When to Prefer Streaming

何时优先使用流式响应

Always prefer streaming when:

User-facing (chat, interactive tools)
Response length unknown
Want progressive feedback
Latency matters

Use non-streaming when:

Batch processing
Need complete response before acting
Building API endpoints
Very short responses (< 50 tokens)

以下场景建议始终使用流式响应：

面向用户（聊天、交互式工具）
响应长度未知
需要逐步反馈
延迟敏感

使用非流式响应的场景：

批量处理
需要完整响应后再执行操作
构建API端点
非常短的响应（<50 Token）

When to Enable Specific Features

何时启用特定功能

Tools: Enable when you need external data or actions Structured outputs: Enable when response format matters Web search: Enable when current information needed Streaming: Enable for user-facing, real-time responses Model fallbacks: Enable when reliability critical Provider routing: Enable when you have preferences or constraints

工具：当需要外部数据或操作时启用 结构化输出：当响应格式重要时启用 网页搜索：当需要当前信息时启用 流式响应：面向用户的实时响应场景启用 模型降级：可靠性要求高时启用 供应商路由：有偏好或约束时启用

Cost Optimization Patterns

成本优化模式

Use free models for:

Testing and prototyping
Low-complexity tasks
High-volume, low-value operations

Use routing to optimize:

typescript

{
  provider: {
    order: ['openai', 'anthropic'],
    sort: 'price',  // Optimize for cost
    allow_fallbacks: true
  }
}

Set max_tokens to prevent runaway responses Use caching via

user

and

session_id

parameters Enable prompt caching when supported

免费模型适用场景：

测试与原型开发
低复杂度任务
高容量、低价值操作

使用路由优化成本：

typescript

{
  provider: {
    order: ['openai', 'anthropic'],
    sort: 'price',  // 针对成本优化
    allow_fallbacks: true
  }
}

设置max_tokens以避免无限制响应通过
user
和
session_id
参数启用缓存 支持时启用提示词缓存

Performance Optimization

性能优化

Reduce latency:

Use
```
:nitro
```
variants for speed
Use streaming for perceived speed
Set
```
user
```
ID for caching benefits
Choose faster models (mini, flash) when quality allows

Increase throughput:

Use provider routing with
```
sort: 'throughput'
```
Parallelize independent requests
Use streaming to reduce wait time

Optimize for specific metrics:

typescript

{
  provider: {
    sort: 'latency'  // or 'price' or 'throughput'
  }
}

降低延迟：

使用
```
:nitro
```
变体提升速度
使用流式响应提升感知速度
设置
```
user
```
ID以获得缓存收益
质量允许时选择更快的模型（mini、flash）

提升吞吐量：

使用供应商路由并设置
```
sort: 'throughput'
```
并行处理独立请求
使用流式响应减少等待时间

针对特定指标优化：

typescript

{
  provider: {
    sort: 'latency'  // 或 'price' 或 'throughput'
  }
}

Progressive Disclosure

进阶参考

For detailed reference information, consult:

如需详细参考信息，请查阅：

Parameters Reference

参数参考

File:

references/PARAMETERS.md

Complete parameter reference (50+ parameters)
Types, ranges, defaults
Parameter support by model
Usage examples

文件：

references/PARAMETERS.md

完整的参数参考（50+参数）
类型、范围、默认值
各模型支持的参数
使用示例

Error Codes Reference

错误码参考

File:

references/ERROR_CODES.md

All HTTP status codes
Error response structure
Error metadata types
Native finish reasons
Retry strategies

文件：

references/ERROR_CODES.md

所有HTTP状态码
错误响应结构
错误元数据类型
原生停止原因
重试策略

Model Selection Guide

模型选择指南

File:

references/MODEL_SELECTION.md

Model families and capabilities
Model variants explained
Selection criteria by use case
Model capability matrix
Provider routing preferences

文件：

references/MODEL_SELECTION.md

模型家族与能力
模型变体说明
按使用场景选择的标准
模型能力矩阵
供应商路由偏好

Routing Strategies

路由策略

File:

references/ROUTING_STRATEGIES.md

Model fallbacks configuration
Provider selection patterns
Auto router setup
Routing by use case (cost, latency, quality)

文件：

references/ROUTING_STRATEGIES.md

模型降级配置
供应商选择模式
自动路由设置
按场景路由（成本、延迟、质量）

Advanced Patterns

高级模式

File:

references/ADVANCED_PATTERNS.md

Tool calling with agentic loops
Structured outputs implementation
Web search integration
Multimodal handling
Streaming patterns
Framework integrations

文件：

references/ADVANCED_PATTERNS.md

带Agent循环的工具调用
结构化输出实现
网页搜索集成
多模态处理
流式响应模式
框架集成

Working Examples

实用示例

File:

references/EXAMPLES.md

TypeScript patterns for common tasks
Python examples
cURL examples
Advanced patterns
Framework integration examples

文件：

references/EXAMPLES.md

常见任务的TypeScript模式
Python示例
cURL示例
高级模式
框架集成示例

Ready-to-Use Templates

即用型模板

Directory:

templates/

```
basic-request.ts
```
- Minimal working request
```
streaming-request.ts
```
- SSE streaming with cancellation
```
tool-calling.ts
```
- Complete agentic loop with tools
```
structured-output.ts
```
- JSON Schema enforcement
```
error-handling.ts
```
- Robust retry logic

templates/

```
basic-request.ts
```
- 最简可用请求
```
streaming-request.ts
```
- 带取消功能的SSE流式响应
```
tool-calling.ts
```
- 完整的Agent循环与工具调用
```
structured-output.ts
```
- JSON Schema强制
```
error-handling.ts
```
- 鲁棒的重试逻辑

Quick Reference

快速参考

Minimal Request

最简请求

typescript

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: 'Your prompt' }]
}

typescript

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: 'Your prompt' }]
}

With Streaming

流式请求

typescript

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: '...' }],
  stream: true
}

typescript

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: '...' }],
  stream: true
}

With Tools

带工具调用的请求

typescript

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: '...' }],
  tools: [{ type: 'function', function: { name, description, parameters } }],
  tool_choice: 'auto'
}

typescript

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: '...' }],
  tools: [{ type: 'function', function: { name, description, parameters } }],
  tool_choice: 'auto'
}

With Structured Output

带结构化输出的请求

typescript

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'system', content: 'Output JSON only...' }],
  response_format: { type: 'json_object' }
}

typescript

{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'system', content: '仅输出JSON...' }],
  response_format: { type: 'json_object' }
}

With Web Search

带网页搜索的请求

typescript

{
  model: 'anthropic/claude-3.5-sonnet:online',
  messages: [{ role: 'user', content: '...' }]
}

typescript

{
  model: 'anthropic/claude-3.5-sonnet:online',
  messages: [{ role: 'user', content: '...' }]
}

With Model Fallbacks

带模型降级的请求

typescript

{
  models: ['anthropic/claude-3.5-sonnet', 'openai/gpt-4o'],
  messages: [{ role: 'user', content: '...' }]
}

Remember: OpenRouter is OpenAI-compatible. Use the OpenAI SDK with

baseURL: 'https://openrouter.ai/api/v1'

for a familiar experience.

typescript

{
  models: ['anthropic/claude-3.5-sonnet', 'openai/gpt-4o'],
  messages: [{ role: 'user', content: '...' }]
}

注意：OpenRouter兼容OpenAI。你可以使用OpenAI SDK并设置

baseURL: 'https://openrouter.ai/api/v1'

，获得熟悉的使用体验。