openrouter

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

OpenRouter API for AI Agents

面向AI Agent的OpenRouter API

Expert guidance for AI agents integrating with OpenRouter API - unified access to 400+ models from 90+ providers.
When to use this skill:
  • Making chat completions via OpenRouter API
  • Selecting appropriate models and variants
  • Implementing streaming responses
  • Using tool/function calling
  • Enforcing structured outputs
  • Integrating web search
  • Handling multimodal inputs (images, audio, video, PDFs)
  • Managing model routing and fallbacks
  • Handling errors and retries
  • Optimizing cost and performance

为集成OpenRouter API的AI Agent提供专业指导——可统一访问来自90+供应商的400+模型。
何时使用本技能:
  • 通过OpenRouter API实现对话补全
  • 选择合适的模型及变体
  • 实现流式响应
  • 使用工具/函数调用
  • 强制生成结构化输出
  • 集成网页搜索
  • 处理多模态输入(图片、音频、视频、PDF)
  • 管理模型路由与降级方案
  • 处理错误与重试
  • 优化成本与性能

API Basics

API基础

Making a Request

发起请求

Endpoint:
POST https://openrouter.ai/api/v1/chat/completions
Headers (required):
typescript
{
  'Authorization': `Bearer ${apiKey}`,
  'Content-Type': 'application/json',
  // Optional: for app attribution
  'HTTP-Referer': 'https://your-app.com',
  'X-Title': 'Your App Name'
}
Minimal request structure:
typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${apiKey}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [
      { role: 'user', content: 'Your prompt here' }
    ]
  })
});
Endpoint
POST https://openrouter.ai/api/v1/chat/completions
请求头(必填):
typescript
{
  'Authorization': `Bearer ${apiKey}`,
  'Content-Type': 'application/json',
  // 可选:用于应用归因
  'HTTP-Referer': 'https://your-app.com',
  'X-Title': 'Your App Name'
}
最简请求结构
typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${apiKey}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [
      { role: 'user', content: 'Your prompt here' }
    ]
  })
});

Response Structure

响应结构

Non-streaming response:
json
{
  "id": "gen-abc123",
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Response text here"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  },
  "model": "anthropic/claude-3.5-sonnet"
}
Key fields:
  • choices[0].message.content
    - The assistant's response
  • choices[0].finish_reason
    - Why generation stopped (stop, length, tool_calls, etc.)
  • usage
    - Token counts and cost information
  • model
    - Actual model used (may differ from requested)
非流式响应
json
{
  "id": "gen-abc123",
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Response text here"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 20,
    "total_tokens": 30
  },
  "model": "anthropic/claude-3.5-sonnet"
}
关键字段
  • choices[0].message.content
    - 助手的响应内容
  • choices[0].finish_reason
    - 生成停止的原因(stop、length、tool_calls等)
  • usage
    - Token计数与成本信息
  • model
    - 实际使用的模型(可能与请求的模型不同)

When to Use Streaming vs Non-Streaming

流式与非流式响应的适用场景

Use streaming (
stream: true
)
when:
  • Real-time responses needed (chat interfaces, interactive tools)
  • Latency matters (user-facing applications)
  • Large responses expected (long-form content)
  • Want to show progressive output
Use non-streaming when:
  • Processing in background (batch jobs, async tasks)
  • Need complete response before processing
  • Building to an API/endpoint
  • Response is short (few tokens)
Streaming basics:
typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: { /* ... */ },
  body: JSON.stringify({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [{ role: 'user', content: '...' }],
    stream: true
  })
});

for await (const chunk of response.body) {
  const text = new TextDecoder().decode(chunk);
  const lines = text.split('\n').filter(line => line.startsWith('data: '));

  for (const line of lines) {
    const data = line.slice(6); // Remove 'data: '
    if (data === '[DONE]') break;

    const parsed = JSON.parse(data);
    const content = parsed.choices?.[0]?.delta?.content;
    if (content) {
      // Accumulate or display content
    }
  }
}

**使用流式响应(
stream: true
)**的场景:
  • 需要实时响应(聊天界面、交互式工具)
  • 延迟敏感(面向用户的应用)
  • 预期会生成大篇幅响应(长文本内容)
  • 需要逐步展示输出内容
使用非流式响应的场景:
  • 后台处理(批量任务、异步任务)
  • 需要完整响应后再进行处理
  • 构建API端点
  • 响应内容简短(仅少量Token)
流式响应基础示例
typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: { /* ... */ },
  body: JSON.stringify({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [{ role: 'user', content: '...' }],
    stream: true
  })
});

for await (const chunk of response.body) {
  const text = new TextDecoder().decode(chunk);
  const lines = text.split('\n').filter(line => line.startsWith('data: '));

  for (const line of lines) {
    const data = line.slice(6); // 移除 'data: '
    if (data === '[DONE]') break;

    const parsed = JSON.parse(data);
    const content = parsed.choices?.[0]?.delta?.content;
    if (content) {
      // 累积或展示内容
    }
  }
}

Model Selection

模型选择

Model Identifier Format

模型标识符格式

Format:
provider/model-name[:variant]
Examples:
  • anthropic/claude-3.5-sonnet
    - Specific model
  • openai/gpt-4o:online
    - With web search enabled
  • google/gemini-2.0-flash:free
    - Free tier variant
格式
provider/model-name[:variant]
示例:
  • anthropic/claude-3.5-sonnet
    - 特定模型
  • openai/gpt-4o:online
    - 启用网页搜索的变体
  • google/gemini-2.0-flash:free
    - 免费层级变体

Model Variants and When to Use Them

模型变体及适用场景

VariantUse WhenTradeoffs
:free
Cost is primary concern, testing, prototypingRate limits, lower quality models
:online
Need current information, real-time dataHigher cost, web search latency
:extended
Large context window neededMay be slower, higher cost
:thinking
Complex reasoning, multi-step problemsHigher token usage, slower
:nitro
Speed is criticalMay have quality tradeoffs
:exacto
Need specific providerNo fallbacks, may be less available
变体使用场景权衡
:free
成本为首要考虑因素、测试、原型开发有调用速率限制,模型质量较低
:online
需要当前信息、实时数据成本更高,网页搜索存在延迟
:extended
需要大上下文窗口可能更慢,成本更高
:thinking
复杂推理、多步骤问题Token消耗更高,速度更慢
:nitro
速度至关重要可能存在质量权衡
:exacto
需要特定供应商的模型无降级方案,可用性可能较低

Default Model Choices by Task

按任务选择默认模型

General purpose:
anthropic/claude-3.5-sonnet
or
openai/gpt-4o
  • Balanced quality, speed, cost
  • Good for most tasks
Coding:
anthropic/claude-3.5-sonnet
or
openai/gpt-4o
  • Strong code generation and understanding
  • Good reasoning
Complex reasoning:
anthropic/claude-opus-4:thinking
or
openai/o3
  • Deep reasoning capabilities
  • Higher cost, slower
Fast responses:
openai/gpt-4o-mini:nitro
or
google/gemini-2.0-flash
  • Minimal latency
  • Good for real-time applications
Cost-sensitive:
google/gemini-2.0-flash:free
or
meta-llama/llama-3.1-70b:free
  • No cost with limits
  • Good for high-volume, lower-complexity tasks
Current information:
anthropic/claude-3.5-sonnet:online
or
google/gemini-2.5-pro:online
  • Web search built-in
  • Real-time data
Large context:
anthropic/claude-3.5-sonnet:extended
or
google/gemini-2.5-pro:extended
  • 200K+ context windows
  • Document analysis, codebase understanding
通用场景
anthropic/claude-3.5-sonnet
openai/gpt-4o
  • 质量、速度、成本平衡
  • 适用于大多数任务
代码开发
anthropic/claude-3.5-sonnet
openai/gpt-4o
  • 强大的代码生成与理解能力
  • 推理能力出色
复杂推理
anthropic/claude-opus-4:thinking
openai/o3
  • 深度推理能力
  • 成本更高,速度更慢
快速响应
openai/gpt-4o-mini:nitro
google/gemini-2.0-flash
  • 延迟极低
  • 适用于实时应用
成本敏感场景
google/gemini-2.0-flash:free
meta-llama/llama-3.1-70b:free
  • 免费使用但有额度限制
  • 适用于高容量、低复杂度任务
当前信息获取
anthropic/claude-3.5-sonnet:online
google/gemini-2.5-pro:online
  • 内置网页搜索
  • 可获取实时数据
大上下文需求
anthropic/claude-3.5-sonnet:extended
google/gemini-2.5-pro:extended
  • 支持200K+上下文窗口
  • 适用于文档分析、代码库理解

Provider Routing Preferences

供应商路由偏好

Default behavior: OpenRouter automatically selects best provider
Explicit provider order:
typescript
{
  provider: {
    order: ['anthropic', 'openai', 'google'],
    allow_fallbacks: true,
    sort: 'price' // 'price', 'latency', or 'throughput'
  }
}
When to set provider order:
  • Have preferred provider arrangements
  • Need to optimize for specific metric (cost, speed)
  • Want to exclude certain providers
  • Have BYOK (Bring Your Own Key) for specific providers
默认行为:OpenRouter自动选择最优供应商
显式指定供应商顺序
typescript
{
  provider: {
    order: ['anthropic', 'openai', 'google'],
    allow_fallbacks: true,
    sort: 'price' // 'price'、'latency' 或 'throughput'
  }
}
何时设置供应商顺序
  • 有偏好的供应商合作安排
  • 需要针对特定指标优化(成本、速度)
  • 想要排除某些供应商
  • 为特定供应商使用自有密钥(BYOK)

Model Fallbacks

模型降级方案

Automatic fallback - try multiple models in order:
typescript
{
  models: [
    'anthropic/claude-3.5-sonnet',
    'openai/gpt-4o',
    'google/gemini-2.0-flash'
  ]
}
When to use fallbacks:
  • High reliability required
  • Multiple providers acceptable
  • Want graceful degradation
  • Avoid single point of failure
Fallback behavior:
  • Tries first model
  • Falls to next on error (5xx, 429, timeout)
  • Uses whichever succeeds
  • Returns which model was used in
    model
    field

自动降级 - 按顺序尝试多个模型:
typescript
{
  models: [
    'anthropic/claude-3.5-sonnet',
    'openai/gpt-4o',
    'google/gemini-2.0-flash'
  ]
}
何时使用降级方案
  • 要求高可靠性
  • 可接受多个供应商的模型
  • 希望实现优雅降级
  • 避免单点故障
降级行为
  • 首先尝试第一个模型
  • 遇到错误(5xx、429、超时)时切换到下一个模型
  • 使用第一个成功的模型
  • model
    字段中返回实际使用的模型

Parameters You Need

必备参数

Core Parameters

核心参数

model (string, optional)
  • Which model to use
  • Default: user's default model
  • Always specify for consistency
messages (Message[], required)
  • Conversation history
  • Structure:
    { role: 'user'|'assistant'|'system', content: string | ContentPart[] }
  • For multimodal: content can be array of text and image_url parts
stream (boolean, default: false)
  • Enable Server-Sent Events streaming
  • Use for real-time responses
temperature (float, 0.0-2.0, default: 1.0)
  • Controls randomness
  • 0.0-0.3: Deterministic, factual responses (code, precise answers)
  • 0.4-0.7: Balanced (general use)
  • 0.8-1.2: Creative (brainstorming, creative writing)
  • 1.3-2.0: Highly creative, unpredictable (experimental)
max_tokens (integer, optional)
  • Maximum tokens to generate
  • Always set to control cost and prevent runaway responses
  • Typical: 100-500 for short, 1000-2000 for long responses
  • Model limit: context_length - prompt_length
top_p (float, 0.0-1.0, default: 1.0)
  • Nucleus sampling - limits to top probability mass
  • Use instead of temperature when you want predictable diversity
  • 0.9-0.95: Common settings for quality
top_k (integer, 0+, default: 0/disabled)
  • Limit to K most likely tokens
  • 1: Always most likely (deterministic)
  • 40-50: Balanced
  • Not available for OpenAI models
model(字符串,可选)
  • 使用的模型
  • 默认值:用户的默认模型
  • 为保持一致性,建议始终指定
messages(Message数组,必填)
  • 对话历史
  • 结构:
    { role: 'user'|'assistant'|'system', content: string | ContentPart[] }
  • 多模态场景:content可以是文本和image_url部分的数组
stream(布尔值,默认:false)
  • 启用Server-Sent Events流式响应
  • 用于实时响应场景
temperature(浮点数,0.0-2.0,默认:1.0)
  • 控制生成的随机性
  • 0.0-0.3:确定性、事实性响应(代码、精准答案)
  • 0.4-0.7:平衡型(通用场景)
  • 0.8-1.2:创意型(头脑风暴、创意写作)
  • 1.3-2.0:高度创意、不可预测(实验性场景)
max_tokens(整数,可选)
  • 生成的最大Token数
  • 建议始终设置以控制成本并避免无限制生成
  • 典型值:短响应100-500,长响应1000-2000
  • 模型限制:context_length - prompt_length
top_p(浮点数,0.0-1.0,默认:1.0)
  • 核采样 - 限制为Top概率质量的Token
  • 当你想要可预测的多样性时,使用此参数替代temperature
  • 0.9-0.95:常用的高质量设置
top_k(整数,0+,默认:0/禁用)
  • 限制为K个最可能的Token
  • 1:始终选择最可能的Token(确定性)
  • 40-50:平衡型
  • OpenAI模型不支持此参数

Sampling Strategy Guidelines

采样策略指南

For code generation:
temperature: 0.1-0.3, top_p: 0.95
For factual responses:
temperature: 0.0-0.2
For creative writing:
temperature: 0.8-1.2
For brainstorming:
temperature: 1.0-1.5
For chat:
temperature: 0.6-0.8
代码生成
temperature: 0.1-0.3, top_p: 0.95
事实性响应
temperature: 0.0-0.2
创意写作
temperature: 0.8-1.2
头脑风暴
temperature: 1.0-1.5
聊天场景
temperature: 0.6-0.8

Tool Calling Parameters

工具调用参数

tools (Tool[], default: [])
  • Available functions for model to call
  • Structure:
typescript
{
  type: 'function',
  function: {
    name: 'function_name',
    description: 'What it does',
    parameters: { /* JSON Schema */ }
  }
}
tool_choice (string | object, default: 'auto')
  • Control when tools are called
  • 'auto'
    : Model decides (default)
  • 'none'
    : Never call tools
  • 'required'
    : Must call a tool
  • { type: 'function', function: { name: 'specific_tool' } }
    : Force specific tool
parallel_tool_calls (boolean, default: true)
  • Allow multiple tools simultaneously
  • Set
    false
    for sequential execution
When to use tools:
  • Need to query external APIs (weather, search, database)
  • Need to perform calculations or data processing
  • Building agentic systems
  • Need structured data extraction
tools(Tool数组,默认:[])
  • 可供模型调用的可用函数
  • 结构:
typescript
{
  type: 'function',
  function: {
    name: 'function_name',
    description: 'What it does',
    parameters: { /* JSON Schema */ }
  }
}
tool_choice(字符串 | 对象,默认:'auto')
  • 控制工具调用时机
  • 'auto'
    :模型自主决定(默认)
  • 'none'
    :从不调用工具
  • 'required'
    :必须调用工具
  • { type: 'function', function: { name: 'specific_tool' } }
    :强制调用特定工具
parallel_tool_calls(布尔值,默认:true)
  • 允许同时调用多个工具
  • 设置为
    false
    以实现顺序执行
何时使用工具
  • 需要查询外部API(天气、搜索、数据库)
  • 需要执行计算或数据处理
  • 构建Agent系统
  • 需要提取结构化数据

Structured Output Parameters

结构化输出参数

response_format (object, optional)
  • Enforce specific output format
JSON object mode:
typescript
{ type: 'json_object' }
  • Model returns valid JSON
  • Must also instruct model in system message
JSON Schema mode (strict):
typescript
{
  type: 'json_schema',
  json_schema: {
    name: 'schema_name',
    strict: true,
    schema: { /* JSON Schema */ }
  }
}
  • Model returns JSON matching exact schema
  • Use when structure is critical (APIs, data processing)
When to use structured outputs:
  • Need predictable response format
  • Integrating with systems (APIs, databases)
  • Data extraction
  • Form filling
response_format(对象,可选)
  • 强制生成特定格式的输出
JSON对象模式
typescript
{ type: 'json_object' }
  • 模型返回有效的JSON
  • 必须同时在系统消息中指示模型
JSON Schema模式(严格):
typescript
{
  type: 'json_schema',
  json_schema: {
    name: 'schema_name',
    strict: true,
    schema: { /* JSON Schema */ }
  }
}
  • 模型返回符合精确Schema的JSON
  • 当结构至关重要时使用(API、数据处理)
何时使用结构化输出
  • 需要可预测的响应格式
  • 与系统集成(API、数据库)
  • 数据提取
  • 表单填充

Web Search Parameters

网页搜索参数

Enable via model variant (simplest):
typescript
{ model: 'anthropic/claude-3.5-sonnet:online' }
Enable via plugin:
typescript
{
  plugins: [{
    id: 'web',
    enabled: true,
    max_results: 5
  }]
}
When to use web search:
  • Need current information (news, prices, events)
  • User asks about recent developments
  • Need factual verification
  • Topic requires real-time data
通过模型变体启用(最简单):
typescript
{ model: 'anthropic/claude-3.5-sonnet:online' }
通过插件启用
typescript
{
  plugins: [{
    id: 'web',
    enabled: true,
    max_results: 5
  }]
}
何时使用网页搜索
  • 需要当前信息(新闻、价格、事件)
  • 用户询问近期动态
  • 需要事实验证
  • 主题需要实时数据

Other Important Parameters

其他重要参数

user (string, optional)
  • Stable identifier for end-user
  • Set when you have user IDs
  • Helps with abuse detection and caching
session_id (string, optional)
  • Group related requests
  • Set for conversation tracking
  • Improves caching and observability
metadata (Record<string, string>, optional)
  • Custom metadata (max 16 key-value pairs)
  • Use for analytics and tracking
  • Keys: max 64 chars, Values: max 512 chars
stop (string | string[], optional)
  • Stop sequences to halt generation
  • Common:
    ['\n\n', '###', 'END']

user(字符串,可选)
  • 终端用户的稳定标识符
  • 当你有用户ID时建议设置
  • 有助于滥用检测与缓存
session_id(字符串,可选)
  • 分组相关请求
  • 为对话追踪建议设置
  • 提升缓存效果与可观测性
metadata(Record<string, string>,可选)
  • 自定义元数据(最多16个键值对)
  • 用于分析与追踪
  • 键:最多64字符,值:最多512字符
stop(字符串 | 字符串数组,可选)
  • 停止序列,用于终止生成
  • 常见值:
    ['\n\n', '###', 'END']

Handling Responses

响应处理

Non-Streaming Responses

非流式响应

Extract content:
typescript
const response = await fetch(/* ... */);
const data = await response.json();

const content = data.choices[0].message.content;
const finishReason = data.choices[0].finish_reason;
const usage = data.usage;
Check for tool calls:
typescript
const toolCalls = data.choices[0].message.tool_calls;
if (toolCalls) {
  // Model wants to call tools
  for (const toolCall of toolCalls) {
    const { name, arguments: args } = toolCall.function;
    const parsedArgs = JSON.parse(args);
    // Execute tool...
  }
}
提取内容:
typescript
const response = await fetch(/* ... */);
const data = await response.json();

const content = data.choices[0].message.content;
const finishReason = data.choices[0].finish_reason;
const usage = data.usage;
检查工具调用:
typescript
const toolCalls = data.choices[0].message.tool_calls;
if (toolCalls) {
  // 模型想要调用工具
  for (const toolCall of toolCalls) {
    const { name, arguments: args } = toolCall.function;
    const parsedArgs = JSON.parse(args);
    // 执行工具...
  }
}

Streaming Responses

流式响应

Process SSE stream:
typescript
let fullContent = '';
const response = await fetch(/* ... */);

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split('\n').filter(line => line.startsWith('data: '));

  for (const line of lines) {
    const data = line.slice(6);
    if (data === '[DONE]') break;

    const parsed = JSON.parse(data);
    const content = parsed.choices?.[0]?.delta?.content;
    if (content) {
      fullContent += content;
      // Process incrementally...
    }

    // Handle usage in final chunk
    if (parsed.usage) {
      console.log('Usage:', parsed.usage);
    }
  }
}
Handle streaming tool calls:
typescript
// Tool calls stream across multiple chunks
let currentToolCall = null;
let toolArgs = '';

for (const parsed of chunks) {
  const toolCallChunk = parsed.choices?.[0]?.delta?.tool_calls?.[0];

  if (toolCallChunk?.function?.name) {
    currentToolCall = { id: toolCallChunk.id, ...toolCallChunk.function };
  }

  if (toolCallChunk?.function?.arguments) {
    toolArgs += toolCallChunk.function.arguments;
  }

  if (parsed.choices?.[0]?.finish_reason === 'tool_calls' && currentToolCall) {
    // Complete tool call
    currentToolCall.arguments = toolArgs;
    // Execute tool...
  }
}
处理SSE流:
typescript
let fullContent = '';
const response = await fetch(/* ... */);

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split('\n').filter(line => line.startsWith('data: '));

  for (const line of lines) {
    const data = line.slice(6);
    if (data === '[DONE]') break;

    const parsed = JSON.parse(data);
    const content = parsed.choices?.[0]?.delta?.content;
    if (content) {
      fullContent += content;
      // 逐步处理...
    }

    // 在最终块中处理使用情况
    if (parsed.usage) {
      console.log('Usage:', parsed.usage);
    }
  }
}
处理流式工具调用:
typescript
// 工具调用跨多个块流式传输
let currentToolCall = null;
let toolArgs = '';

for (const parsed of chunks) {
  const toolCallChunk = parsed.choices?.[0]?.delta?.tool_calls?.[0];

  if (toolCallChunk?.function?.name) {
    currentToolCall = { id: toolCallChunk.id, ...toolCallChunk.function };
  }

  if (toolCallChunk?.function?.arguments) {
    toolArgs += toolCallChunk.function.arguments;
  }

  if (parsed.choices?.[0]?.finish_reason === 'tool_calls' && currentToolCall) {
    // 完整的工具调用
    currentToolCall.arguments = toolArgs;
    // 执行工具...
  }
}

Usage and Cost Tracking

使用情况与成本追踪

typescript
const { usage } = data;
console.log(`Prompt: ${usage.prompt_tokens}`);
console.log(`Completion: ${usage.completion_tokens}`);
console.log(`Total: ${usage.total_tokens}`);

// Cost (if available)
if (usage.cost) {
  console.log(`Cost: $${usage.cost.toFixed(6)}`);
}

// Detailed breakdown
console.log(usage.prompt_tokens_details);
console.log(usage.completion_tokens_details);

typescript
const { usage } = data;
console.log(`提示词Token数: ${usage.prompt_tokens}`);
console.log(`补全Token数: ${usage.completion_tokens}`);
console.log(`总Token数: ${usage.total_tokens}`);

// 成本(如果可用)
if (usage.cost) {
  console.log(`成本: $${usage.cost.toFixed(6)}`);
}

// 详细 breakdown
console.log(usage.prompt_tokens_details);
console.log(usage.completion_tokens_details);

Error Handling

错误处理

Common HTTP Status Codes

常见HTTP状态码

400 Bad Request
  • Invalid request format
  • Missing required fields
  • Parameter out of range
  • Fix: Validate request structure and parameters
401 Unauthorized
  • Missing or invalid API key
  • Fix: Check API key format and permissions
403 Forbidden
  • Insufficient permissions
  • Model not allowed
  • Fix: Check guardrails, model access, API key permissions
402 Payment Required
  • Insufficient credits
  • Fix: Add credits to account
408 Request Timeout
  • Request took too long
  • Fix: Reduce prompt length, use streaming, try simpler model
429 Rate Limited
  • Too many requests
  • Fix: Implement exponential backoff, reduce request rate
502 Bad Gateway
  • Provider error
  • Fix: Use model fallbacks, retry with different model
503 Service Unavailable
  • Service overloaded
  • Fix: Retry with backoff, use fallbacks
400 Bad Request
  • 请求格式无效
  • 缺少必填字段
  • 参数超出范围
  • 修复方案:验证请求结构与参数
401 Unauthorized
  • 缺少或无效的API密钥
  • 修复方案:检查API密钥格式与权限
403 Forbidden
  • 权限不足
  • 模型不允许访问
  • 修复方案:检查防护规则、模型访问权限、API密钥权限
402 Payment Required
  • 余额不足
  • 修复方案:为账户充值
408 Request Timeout
  • 请求耗时过长
  • 修复方案:缩短提示词长度、使用流式响应、尝试更简单的模型
429 Rate Limited
  • 请求过于频繁
  • 修复方案:实现指数退避、降低请求速率
502 Bad Gateway
  • 供应商错误
  • 修复方案:使用模型降级方案、重试其他模型
503 Service Unavailable
  • 服务过载
  • 修复方案:退避后重试、使用降级方案

Retry Strategy

重试策略

Exponential backoff:
typescript
async function requestWithRetry(url, body, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await fetch(url, body);

      if (response.ok) {
        return await response.json();
      }

      // Retry on rate limit or server errors
      if (response.status === 429 || response.status >= 500) {
        const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }

      // Don't retry other errors
      return response;
    } catch (error) {
      if (attempt === maxRetries - 1) throw error;
      const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
}
Retryable status codes: 408, 429, 502, 503 Do not retry: 400, 401, 403, 402
指数退避
typescript
async function requestWithRetry(url, body, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await fetch(url, body);

      if (response.ok) {
        return await response.json();
      }

      // 速率限制或服务器错误时重试
      if (response.status === 429 || response.status >= 500) {
        const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
        await new Promise(resolve => setTimeout(resolve, delay));
        continue;
      }

      // 其他错误不重试
      return response;
    } catch (error) {
      if (attempt === maxRetries - 1) throw error;
      const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
}
可重试的状态码:408、429、502、503 不可重试:400、401、403、402

Graceful Degradation

优雅降级

Use model fallbacks:
typescript
{
  models: [
    'anthropic/claude-3.5-sonnet',  // Primary
    'openai/gpt-4o',                // Fallback 1
    'google/gemini-2.0-flash'        // Fallback 2
  ]
}
Handle partial failures:
  • Log errors but continue
  • Fall back to simpler features
  • Use cached responses when available
  • Provide degraded experience rather than failing completely

使用模型降级方案
typescript
{
  models: [
    'anthropic/claude-3.5-sonnet',  // 主模型
    'openai/gpt-4o',                // 降级模型1
    'google/gemini-2.0-flash'        // 降级模型2
  ]
}
处理部分故障
  • 记录错误但继续执行
  • 降级到更简单的功能
  • 可用时使用缓存响应
  • 提供降级体验而非完全失败

Advanced Features

高级功能

When to Use Tool Calling

何时使用工具调用

Good use cases:
  • Querying external APIs (weather, stock prices, databases)
  • Performing calculations or data processing
  • Extracting structured data from unstructured text
  • Building agentic systems with multiple steps
  • When decisions require external information
Implementation pattern:
  1. Define tools with clear descriptions and parameters
  2. Send request with
    tools
    array
  3. Check if
    tool_calls
    present in response
  4. Execute tools with parsed arguments
  5. Send tool results back in a new request
  6. Repeat until model provides final answer
See:
references/ADVANCED_PATTERNS.md
for complete agentic loop implementation
适用场景
  • 查询外部API(天气、股票价格、数据库)
  • 执行计算或数据处理
  • 从非结构化文本中提取结构化数据
  • 构建多步骤的Agent系统
  • 决策需要外部信息时
实现模式
  1. 定义具有清晰描述和参数的工具
  2. 发送包含
    tools
    数组的请求
  3. 检查响应中是否存在
    tool_calls
  4. 使用解析后的参数执行工具
  5. 在新请求中发送工具结果
  6. 重复直到模型提供最终答案
参考
references/ADVANCED_PATTERNS.md
以获取完整的Agent循环实现

When to Use Structured Outputs

何时使用结构化输出

Good use cases:
  • API responses (need specific schema)
  • Data extraction (forms, documents)
  • Configuration files (JSON, YAML)
  • Database operations (structured queries)
  • When downstream processing requires specific format
Implementation pattern:
  1. Define JSON Schema for desired output
  2. Set
    response_format: { type: 'json_schema', json_schema: { ... } }
  3. Instruct model to produce JSON (system or user message)
  4. Validate response against schema
  5. Handle parsing errors gracefully
Add response healing for robustness:
typescript
{
  response_format: { /* ... */ },
  plugins: [{ id: 'response-healing' }]
}
适用场景
  • API响应(需要特定Schema)
  • 数据提取(表单、文档)
  • 配置文件(JSON、YAML)
  • 数据库操作(结构化查询)
  • 下游处理需要特定格式时
实现模式
  1. 定义所需输出的JSON Schema
  2. 设置
    response_format: { type: 'json_schema', json_schema: { ... } }
  3. 指示模型生成JSON(系统或用户消息)
  4. 验证响应是否符合Schema
  5. 优雅处理解析错误
添加响应修复以提升鲁棒性
typescript
{
  response_format: { /* ... */ },
  plugins: [{ id: 'response-healing' }]
}

When to Use Web Search

何时使用网页搜索

Good use cases:
  • User asks about recent events, news, or current data
  • Need verification of facts
  • Questions with time-sensitive information
  • Topic requires up-to-date information
  • User explicitly requests current information
Simple implementation (variant):
typescript
{
  model: 'anthropic/claude-3.5-sonnet:online'
}
Advanced implementation (plugin):
typescript
{
  model: 'openrouter.ai/auto',
  plugins: [{
    id: 'web',
    enabled: true,
    max_results: 5,
    engine: 'exa' // or 'native'
  }]
}
适用场景
  • 用户询问近期事件、新闻或当前数据
  • 需要验证事实
  • 问题包含时间敏感信息
  • 主题需要最新信息
  • 用户明确请求当前信息
简单实现(变体):
typescript
{
  model: 'anthropic/claude-3.5-sonnet:online'
}
高级实现(插件):
typescript
{
  model: 'openrouter.ai/auto',
  plugins: [{
    id: 'web',
    enabled: true,
    max_results: 5,
    engine: 'exa' // 或 'native'
  }]
}

When to Use Multimodal Inputs

何时使用多模态输入

Images (vision):
  • OCR, image understanding, visual analysis
  • Models:
    openai/gpt-4o
    ,
    anthropic/claude-3.5-sonnet
    ,
    google/gemini-2.5-pro
Audio:
  • Speech-to-text, audio analysis
  • Models with audio support
Video:
  • Video understanding, frame analysis
  • Models with video support
PDFs:
  • Document parsing, content extraction
  • Requires
    file-parser
    plugin
Implementation: See
references/ADVANCED_PATTERNS.md
for multimodal patterns

图片(视觉):
  • OCR、图片理解、视觉分析
  • 支持模型:
    openai/gpt-4o
    anthropic/claude-3.5-sonnet
    google/gemini-2.5-pro
音频
  • 语音转文本、音频分析
  • 支持音频的模型
视频
  • 视频理解、帧分析
  • 支持视频的模型
PDF
  • 文档解析、内容提取
  • 需要
    file-parser
    插件
实现:参考
references/ADVANCED_PATTERNS.md
中的多模态模式

Best Practices for AI

AI最佳实践

Default Model Selection

默认模型选择

Start with:
anthropic/claude-3.5-sonnet
or
openai/gpt-4o
  • Good balance of quality, speed, cost
  • Strong at most tasks
  • Wide compatibility
Switch based on needs:
  • Need speed →
    openai/gpt-4o-mini:nitro
    or
    google/gemini-2.0-flash
  • Complex reasoning →
    anthropic/claude-opus-4:thinking
  • Need web search →
    :online
    variant
  • Large context →
    :extended
    variant
  • Cost-sensitive →
    :free
    variant
首选
anthropic/claude-3.5-sonnet
openai/gpt-4o
  • 质量、速度、成本平衡
  • 擅长大多数任务
  • 兼容性广泛
根据需求切换
  • 需要速度 →
    openai/gpt-4o-mini:nitro
    google/gemini-2.0-flash
  • 复杂推理 →
    anthropic/claude-opus-4:thinking
  • 需要网页搜索 →
    :online
    变体
  • 大上下文需求 →
    :extended
    变体
  • 成本敏感 →
    :free
    变体

Default Parameters

默认参数

typescript
{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [...],
  temperature: 0.6,  // Balanced creativity
  max_tokens: 1000,   // Reasonable length
  top_p: 0.95        // Common for quality
}
Adjust based on task:
  • Code:
    temperature: 0.2
  • Creative:
    temperature: 1.0
  • Factual:
    temperature: 0.0-0.3
typescript
{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [...],
  temperature: 0.6,  // 平衡的创造性
  max_tokens: 1000,   // 合理的长度
  top_p: 0.95        // 常用的高质量设置
}
根据任务调整
  • 代码:
    temperature: 0.2
  • 创意写作:
    temperature: 1.0
  • 事实性响应:
    temperature: 0.0-0.3

When to Prefer Streaming

何时优先使用流式响应

Always prefer streaming when:
  • User-facing (chat, interactive tools)
  • Response length unknown
  • Want progressive feedback
  • Latency matters
Use non-streaming when:
  • Batch processing
  • Need complete response before acting
  • Building API endpoints
  • Very short responses (< 50 tokens)
以下场景建议始终使用流式响应
  • 面向用户(聊天、交互式工具)
  • 响应长度未知
  • 需要逐步反馈
  • 延迟敏感
使用非流式响应的场景
  • 批量处理
  • 需要完整响应后再执行操作
  • 构建API端点
  • 非常短的响应(<50 Token)

When to Enable Specific Features

何时启用特定功能

Tools: Enable when you need external data or actions Structured outputs: Enable when response format matters Web search: Enable when current information needed Streaming: Enable for user-facing, real-time responses Model fallbacks: Enable when reliability critical Provider routing: Enable when you have preferences or constraints
工具:当需要外部数据或操作时启用 结构化输出:当响应格式重要时启用 网页搜索:当需要当前信息时启用 流式响应:面向用户的实时响应场景启用 模型降级:可靠性要求高时启用 供应商路由:有偏好或约束时启用

Cost Optimization Patterns

成本优化模式

Use free models for:
  • Testing and prototyping
  • Low-complexity tasks
  • High-volume, low-value operations
Use routing to optimize:
typescript
{
  provider: {
    order: ['openai', 'anthropic'],
    sort: 'price',  // Optimize for cost
    allow_fallbacks: true
  }
}
Set max_tokens to prevent runaway responses Use caching via
user
and
session_id
parameters Enable prompt caching when supported
免费模型适用场景
  • 测试与原型开发
  • 低复杂度任务
  • 高容量、低价值操作
使用路由优化成本
typescript
{
  provider: {
    order: ['openai', 'anthropic'],
    sort: 'price',  // 针对成本优化
    allow_fallbacks: true
  }
}
设置max_tokens以避免无限制响应 通过
user
session_id
参数启用缓存
支持时启用提示词缓存

Performance Optimization

性能优化

Reduce latency:
  • Use
    :nitro
    variants for speed
  • Use streaming for perceived speed
  • Set
    user
    ID for caching benefits
  • Choose faster models (mini, flash) when quality allows
Increase throughput:
  • Use provider routing with
    sort: 'throughput'
  • Parallelize independent requests
  • Use streaming to reduce wait time
Optimize for specific metrics:
typescript
{
  provider: {
    sort: 'latency'  // or 'price' or 'throughput'
  }
}

降低延迟
  • 使用
    :nitro
    变体提升速度
  • 使用流式响应提升感知速度
  • 设置
    user
    ID以获得缓存收益
  • 质量允许时选择更快的模型(mini、flash)
提升吞吐量
  • 使用供应商路由并设置
    sort: 'throughput'
  • 并行处理独立请求
  • 使用流式响应减少等待时间
针对特定指标优化
typescript
{
  provider: {
    sort: 'latency'  // 或 'price' 或 'throughput'
  }
}

Progressive Disclosure

进阶参考

For detailed reference information, consult:
如需详细参考信息,请查阅:

Parameters Reference

参数参考

File:
references/PARAMETERS.md
  • Complete parameter reference (50+ parameters)
  • Types, ranges, defaults
  • Parameter support by model
  • Usage examples
文件
references/PARAMETERS.md
  • 完整的参数参考(50+参数)
  • 类型、范围、默认值
  • 各模型支持的参数
  • 使用示例

Error Codes Reference

错误码参考

File:
references/ERROR_CODES.md
  • All HTTP status codes
  • Error response structure
  • Error metadata types
  • Native finish reasons
  • Retry strategies
文件
references/ERROR_CODES.md
  • 所有HTTP状态码
  • 错误响应结构
  • 错误元数据类型
  • 原生停止原因
  • 重试策略

Model Selection Guide

模型选择指南

File:
references/MODEL_SELECTION.md
  • Model families and capabilities
  • Model variants explained
  • Selection criteria by use case
  • Model capability matrix
  • Provider routing preferences
文件
references/MODEL_SELECTION.md
  • 模型家族与能力
  • 模型变体说明
  • 按使用场景选择的标准
  • 模型能力矩阵
  • 供应商路由偏好

Routing Strategies

路由策略

File:
references/ROUTING_STRATEGIES.md
  • Model fallbacks configuration
  • Provider selection patterns
  • Auto router setup
  • Routing by use case (cost, latency, quality)
文件
references/ROUTING_STRATEGIES.md
  • 模型降级配置
  • 供应商选择模式
  • 自动路由设置
  • 按场景路由(成本、延迟、质量)

Advanced Patterns

高级模式

File:
references/ADVANCED_PATTERNS.md
  • Tool calling with agentic loops
  • Structured outputs implementation
  • Web search integration
  • Multimodal handling
  • Streaming patterns
  • Framework integrations
文件
references/ADVANCED_PATTERNS.md
  • 带Agent循环的工具调用
  • 结构化输出实现
  • 网页搜索集成
  • 多模态处理
  • 流式响应模式
  • 框架集成

Working Examples

实用示例

File:
references/EXAMPLES.md
  • TypeScript patterns for common tasks
  • Python examples
  • cURL examples
  • Advanced patterns
  • Framework integration examples
文件
references/EXAMPLES.md
  • 常见任务的TypeScript模式
  • Python示例
  • cURL示例
  • 高级模式
  • 框架集成示例

Ready-to-Use Templates

即用型模板

Directory:
templates/
  • basic-request.ts
    - Minimal working request
  • streaming-request.ts
    - SSE streaming with cancellation
  • tool-calling.ts
    - Complete agentic loop with tools
  • structured-output.ts
    - JSON Schema enforcement
  • error-handling.ts
    - Robust retry logic

目录
templates/
  • basic-request.ts
    - 最简可用请求
  • streaming-request.ts
    - 带取消功能的SSE流式响应
  • tool-calling.ts
    - 完整的Agent循环与工具调用
  • structured-output.ts
    - JSON Schema强制
  • error-handling.ts
    - 鲁棒的重试逻辑

Quick Reference

快速参考

Minimal Request

最简请求

typescript
{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: 'Your prompt' }]
}
typescript
{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: 'Your prompt' }]
}

With Streaming

流式请求

typescript
{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: '...' }],
  stream: true
}
typescript
{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: '...' }],
  stream: true
}

With Tools

带工具调用的请求

typescript
{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: '...' }],
  tools: [{ type: 'function', function: { name, description, parameters } }],
  tool_choice: 'auto'
}
typescript
{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'user', content: '...' }],
  tools: [{ type: 'function', function: { name, description, parameters } }],
  tool_choice: 'auto'
}

With Structured Output

带结构化输出的请求

typescript
{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'system', content: 'Output JSON only...' }],
  response_format: { type: 'json_object' }
}
typescript
{
  model: 'anthropic/claude-3.5-sonnet',
  messages: [{ role: 'system', content: '仅输出JSON...' }],
  response_format: { type: 'json_object' }
}

With Web Search

带网页搜索的请求

typescript
{
  model: 'anthropic/claude-3.5-sonnet:online',
  messages: [{ role: 'user', content: '...' }]
}
typescript
{
  model: 'anthropic/claude-3.5-sonnet:online',
  messages: [{ role: 'user', content: '...' }]
}

With Model Fallbacks

带模型降级的请求

typescript
{
  models: ['anthropic/claude-3.5-sonnet', 'openai/gpt-4o'],
  messages: [{ role: 'user', content: '...' }]
}

Remember: OpenRouter is OpenAI-compatible. Use the OpenAI SDK with
baseURL: 'https://openrouter.ai/api/v1'
for a familiar experience.
typescript
{
  models: ['anthropic/claude-3.5-sonnet', 'openai/gpt-4o'],
  messages: [{ role: 'user', content: '...' }]
}

注意:OpenRouter兼容OpenAI。你可以使用OpenAI SDK并设置
baseURL: 'https://openrouter.ai/api/v1'
,获得熟悉的使用体验。