llm-gateway-routing

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

LLM Gateway & Routing

LLM网关与路由

Configure multi-model access, fallbacks, cost optimization, and A/B testing.

配置多模型访问、降级策略、成本优化与A/B测试。

Why Use a Gateway?

为什么使用网关？

Without gateway:

Vendor lock-in (one provider)
No fallbacks (provider down = app down)
Hard to A/B test models
Scattered API keys and configs

With gateway:

Single API for 400+ models
Automatic fallbacks
Easy model switching
Unified cost tracking

无网关时：

供应商锁定（仅能使用单一提供商）
无降级方案（提供商故障则应用瘫痪）
难以对模型进行A/B测试
API密钥与配置分散管理

使用网关后：

单一API对接400+模型
自动降级机制
模型切换便捷
统一成本追踪

Quick Decision

快速选型

Need	Solution
Fastest setup, multi-model	OpenRouter
Full control, self-hosted	LiteLLM
Observability + routing	Helicone
Enterprise, guardrails	Portkey

需求	解决方案
最快搭建、多模型支持	OpenRouter
完全可控、自建部署	LiteLLM
可观测性 + 路由	Helicone
企业级、安全防护	Portkey

OpenRouter (Recommended)

OpenRouter（推荐）

Why OpenRouter

选择OpenRouter的理由

400+ models: OpenAI, Anthropic, Google, Meta, Mistral, and more
Single API: One key for all providers
Automatic fallbacks: Built-in reliability
A/B testing: Easy model comparison
Cost tracking: Unified billing dashboard
Free credits: $1 free to start

400+模型支持：涵盖OpenAI、Anthropic、Google、Meta、Mistral等提供商的模型
单一API：一个密钥对接所有提供商
自动降级：内置可靠性保障
A/B测试：轻松对比模型效果
成本追踪：统一账单仪表盘
免费额度：注册即享1美元免费额度

Setup

搭建步骤

bash

undefined

bash

undefined

1. Sign up at openrouter.ai

1. 在openrouter.ai注册账号

2. Get API key from dashboard

2. 从控制台获取API密钥

3. Add to .env:

3. 添加到.env文件：

OPENROUTER_API_KEY=sk-or-v1-...

undefined

OPENROUTER_API_KEY=sk-or-v1-...

undefined

Basic Usage

基础用法

typescript

// Using fetch
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.OPENROUTER_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'anthropic/claude-3-5-sonnet',
    messages: [{ role: 'user', content: 'Hello!' }],
  }),
});

typescript

// 使用fetch调用
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.OPENROUTER_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'anthropic/claude-3-5-sonnet',
    messages: [{ role: 'user', content: 'Hello!' }],
  }),
});

With Vercel AI SDK (Recommended)

结合Vercel AI SDK（推荐）

typescript

import { createOpenAI } from "@ai-sdk/openai";
import { generateText } from "ai";

const openrouter = createOpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

const { text } = await generateText({
  model: openrouter("anthropic/claude-3-5-sonnet"),
  prompt: "Explain quantum computing",
});

typescript

import { createOpenAI } from "@ai-sdk/openai";
import { generateText } from "ai";

const openrouter = createOpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
});

const { text } = await generateText({
  model: openrouter("anthropic/claude-3-5-sonnet"),
  prompt: "Explain quantum computing",
});

Model IDs

模型ID格式

typescript

// Format: provider/model-name
const models = {
  // Anthropic
  claude35Sonnet: "anthropic/claude-3-5-sonnet",
  claudeHaiku: "anthropic/claude-3-5-haiku",

  // OpenAI
  gpt4o: "openai/gpt-4o",
  gpt4oMini: "openai/gpt-4o-mini",

  // Google
  geminiPro: "google/gemini-pro-1.5",
  geminiFlash: "google/gemini-flash-1.5",

  // Meta
  llama3: "meta-llama/llama-3.1-70b-instruct",

  // Auto (OpenRouter picks best)
  auto: "openrouter/auto",
};

typescript

// 格式：提供商/模型名称
const models = {
  // Anthropic
  claude35Sonnet: "anthropic/claude-3-5-sonnet",
  claudeHaiku: "anthropic/claude-3-5-haiku",

  // OpenAI
  gpt4o: "openai/gpt-4o",
  gpt4oMini: "openai/gpt-4o-mini",

  // Google
  geminiPro: "google/gemini-pro-1.5",
  geminiFlash: "google/gemini-flash-1.5",

  // Meta
  llama3: "meta-llama/llama-3.1-70b-instruct",

  // 自动选择（OpenRouter挑选最优模型）
  auto: "openrouter/auto",
};

Fallback Chains

降级链配置

typescript

// Define fallback order
const modelChain = [
  "anthropic/claude-3-5-sonnet",   // Primary
  "openai/gpt-4o",                  // Fallback 1
  "google/gemini-pro-1.5",          // Fallback 2
];

async function callWithFallback(messages: Message[]) {
  for (const model of modelChain) {
    try {
      return await openrouter.chat({ model, messages });
    } catch (error) {
      console.log(`${model} failed, trying next...`);
    }
  }
  throw new Error("All models failed");
}

typescript

// 定义降级顺序
const modelChain = [
  "anthropic/claude-3-5-sonnet",   // 主模型
  "openai/gpt-4o",                  // 第一降级模型
  "google/gemini-pro-1.5",          // 第二降级模型
];

async function callWithFallback(messages: Message[]) {
  for (const model of modelChain) {
    try {
      return await openrouter.chat({ model, messages });
    } catch (error) {
      console.log(`${model}调用失败，尝试下一个模型...`);
    }
  }
  throw new Error("所有模型调用失败");
}

Cost Routing

基于成本的路由

typescript

// Route based on query complexity
function selectModel(query: string): string {
  const complexity = analyzeComplexity(query);

  if (complexity === "simple") {
    // Simple queries → cheap model
    return "openai/gpt-4o-mini";  // ~$0.15/1M tokens
  } else if (complexity === "medium") {
    // Medium → balanced
    return "google/gemini-flash-1.5";  // ~$0.075/1M tokens
  } else {
    // Complex → best quality
    return "anthropic/claude-3-5-sonnet";  // ~$3/1M tokens
  }
}

function analyzeComplexity(query: string): "simple" | "medium" | "complex" {
  // Simple heuristics
  if (query.length < 50) return "simple";
  if (query.includes("explain") || query.includes("analyze")) return "complex";
  return "medium";
}

typescript

// 根据查询复杂度选择模型
function selectModel(query: string): string {
  const complexity = analyzeComplexity(query);

  if (complexity === "simple") {
    // 简单查询 → 低成本模型
    return "openai/gpt-4o-mini";  // ~0.15美元/百万tokens
  } else if (complexity === "medium") {
    // 中等复杂度 → 平衡成本与效果
    return "google/gemini-flash-1.5";  // ~0.075美元/百万tokens
  } else {
    // 复杂查询 → 最优质量
    return "anthropic/claude-3-5-sonnet";  // ~3美元/百万tokens
  }
}

function analyzeComplexity(query: string): "simple" | "medium" | "complex" {
  // 简单判断规则
  if (query.length < 50) return "simple";
  if (query.includes("explain") || query.includes("analyze")) return "complex";
  return "medium";
}

A/B Testing

A/B测试

typescript

// Random assignment
function getModel(userId: string): string {
  const hash = userId.charCodeAt(0) % 100;

  if (hash < 50) {
    return "anthropic/claude-3-5-sonnet";  // 50%
  } else {
    return "openai/gpt-4o";  // 50%
  }
}

// Track which model was used
const model = getModel(userId);
const response = await openrouter.chat({ model, messages });
await analytics.track("llm_call", { model, userId, latency, cost });

typescript

// 随机分配模型
function getModel(userId: string): string {
  const hash = userId.charCodeAt(0) % 100;

  if (hash < 50) {
    return "anthropic/claude-3-5-sonnet";  // 50%流量
  } else {
    return "openai/gpt-4o";  // 50%流量
  }
}

// 追踪使用的模型
const model = getModel(userId);
const response = await openrouter.chat({ model, messages });
await analytics.track("llm_call", { model, userId, latency, cost });

LiteLLM (Self-Hosted)

LiteLLM（自建部署）

Why LiteLLM

选择LiteLLM的理由

Self-hosted: Full control over data
100+ providers: Same coverage as OpenRouter
Load balancing: Distribute across providers
Cost tracking: Built-in spend management
Caching: Redis or in-memory
Rate limiting: Per-user limits

自建部署：完全控制数据
100+提供商支持：与OpenRouter覆盖范围一致
负载均衡：请求分发至多个提供商
成本追踪：内置支出管理
缓存机制：支持Redis或内存缓存
速率限制：按用户设置调用上限

Setup

搭建步骤

bash

undefined

bash

undefined

Install

安装

pip install litellm[proxy]

Run proxy

启动代理

litellm --config config.yaml

Use as OpenAI-compatible endpoint

作为兼容OpenAI的端点使用

export OPENAI_API_BASE=http://localhost:4000

undefined

export OPENAI_API_BASE=http://localhost:4000

undefined

Configuration

配置文件

yaml

undefined

yaml

undefined

config.yaml

model_list:

Claude models

model_name: claude-sonnet litellm_params: model: anthropic/claude-3-5-sonnet-latest api_key: sk-ant-...

OpenAI models

model_name: gpt-4o litellm_params: model: openai/gpt-4o api_key: sk-...

Load balanced (multiple providers)

model_name: balanced litellm_params: model: anthropic/claude-3-5-sonnet-latest litellm_params: model: openai/gpt-4o
Requests distributed across both

model_list:

Claude模型

model_name: claude-sonnet litellm_params: model: anthropic/claude-3-5-sonnet-latest api_key: sk-ant-...

OpenAI模型

model_name: gpt-4o litellm_params: model: openai/gpt-4o api_key: sk-...

负载均衡（多提供商）

model_name: balanced litellm_params: model: anthropic/claude-3-5-sonnet-latest litellm_params: model: openai/gpt-4o
请求将分发至两个模型

General settings

通用设置

general_settings: master_key: sk-master-... database_url: postgresql://...

Routing

路由配置

router_settings: routing_strategy: simple-shuffle # or latency-based-routing num_retries: 3 timeout: 30

router_settings: routing_strategy: simple-shuffle # 或 latency-based-routing（基于延迟的路由） num_retries: 3 timeout: 30

Rate limiting

速率限制

litellm_settings: max_budget: 100 # $100/month budget_duration: monthly

undefined

litellm_settings: max_budget: 100 # 每月100美元 budget_duration: monthly

undefined

Fallbacks in LiteLLM

LiteLLM中的降级配置

yaml

model_list:
  - model_name: primary
    litellm_params:
      model: anthropic/claude-3-5-sonnet-latest
    fallbacks:
      - model_name: fallback-1
        litellm_params:
          model: openai/gpt-4o
      - model_name: fallback-2
        litellm_params:
          model: google/gemini-pro

yaml

model_list:
  - model_name: primary
    litellm_params:
      model: anthropic/claude-3-5-sonnet-latest
    fallbacks:
      - model_name: fallback-1
        litellm_params:
          model: openai/gpt-4o
      - model_name: fallback-2
        litellm_params:
          model: google/gemini-pro

Usage

使用方法

typescript

// Use like OpenAI SDK
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:4000",
  apiKey: "sk-master-...",
});

const response = await client.chat.completions.create({
  model: "claude-sonnet",  // Maps to configured model
  messages: [{ role: "user", content: "Hello!" }],
});

typescript

// 兼容OpenAI SDK调用
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:4000",
  apiKey: "sk-master-...",
});

const response = await client.chat.completions.create({
  model: "claude-sonnet",  // 映射至配置文件中的模型
  messages: [{ role: "user", content: "Hello!" }],
});

Routing Strategies

路由策略

1. Cost-Based Routing

1. 基于成本的路由

typescript

const costTiers = {
  cheap: ["openai/gpt-4o-mini", "google/gemini-flash-1.5"],
  balanced: ["anthropic/claude-3-5-haiku", "openai/gpt-4o"],
  premium: ["anthropic/claude-3-5-sonnet", "openai/o1-preview"],
};

function routeByCost(budget: "cheap" | "balanced" | "premium"): string {
  const models = costTiers[budget];
  return models[Math.floor(Math.random() * models.length)];
}

typescript

const costTiers = {
  cheap: ["openai/gpt-4o-mini", "google/gemini-flash-1.5"],
  balanced: ["anthropic/claude-3-5-haiku", "openai/gpt-4o"],
  premium: ["anthropic/claude-3-5-sonnet", "openai/o1-preview"],
};

function routeByCost(budget: "cheap" | "balanced" | "premium"): string {
  const models = costTiers[budget];
  return models[Math.floor(Math.random() * models.length)];
}

2. Latency-Based Routing

2. 基于延迟的路由

typescript

// Track latency per model
const latencyStats: Record<string, number[]> = {};

function routeByLatency(): string {
  const avgLatencies = Object.entries(latencyStats)
    .map(([model, times]) => ({
      model,
      avg: times.reduce((a, b) => a + b, 0) / times.length,
    }))
    .sort((a, b) => a.avg - b.avg);

  return avgLatencies[0].model;
}

// Update after each call
function recordLatency(model: string, latencyMs: number) {
  if (!latencyStats[model]) latencyStats[model] = [];
  latencyStats[model].push(latencyMs);
  // Keep last 100 samples
  if (latencyStats[model].length > 100) {
    latencyStats[model].shift();
  }
}

typescript

// 追踪每个模型的延迟
const latencyStats: Record<string, number[]> = {};

function routeByLatency(): string {
  const avgLatencies = Object.entries(latencyStats)
    .map(([model, times]) => ({
      model,
      avg: times.reduce((a, b) => a + b, 0) / times.length,
    }))
    .sort((a, b) => a.avg - b.avg);

  return avgLatencies[0].model;
}

// 每次调用后更新延迟数据
function recordLatency(model: string, latencyMs: number) {
  if (!latencyStats[model]) latencyStats[model] = [];
  latencyStats[model].push(latencyMs);
  // 仅保留最近100条记录
  if (latencyStats[model].length > 100) {
    latencyStats[model].shift();
  }
}

3. Task-Based Routing

3. 基于任务的路由

typescript

const taskModels = {
  coding: "anthropic/claude-3-5-sonnet",  // Best for code
  reasoning: "openai/o1-preview",          // Best for logic
  creative: "anthropic/claude-3-5-sonnet", // Best for writing
  simple: "openai/gpt-4o-mini",            // Cheap and fast
  multimodal: "google/gemini-pro-1.5",     // Vision + text
};

function routeByTask(task: keyof typeof taskModels): string {
  return taskModels[task];
}

typescript

const taskModels = {
  coding: "anthropic/claude-3-5-sonnet",  // 代码任务最优
  reasoning: "openai/o1-preview",          // 逻辑推理最优
  creative: "anthropic/claude-3-5-sonnet", // 创作任务最优
  simple: "openai/gpt-4o-mini",            // 低成本快速响应
  multimodal: "google/gemini-pro-1.5",     // 多模态（视觉+文本）
};

function routeByTask(task: keyof typeof taskModels): string {
  return taskModels[task];
}

4. Hybrid Routing

4. 混合路由

typescript

interface RoutingConfig {
  task: string;
  maxCost: number;
  maxLatency: number;
}

function hybridRoute(config: RoutingConfig): string {
  // Filter by cost
  const affordable = models.filter(m => m.cost <= config.maxCost);

  // Filter by latency
  const fast = affordable.filter(m => m.avgLatency <= config.maxLatency);

  // Select best for task
  const taskScores = fast.map(m => ({
    model: m.id,
    score: getTaskScore(m.id, config.task),
  }));

  return taskScores.sort((a, b) => b.score - a.score)[0].model;
}

typescript

interface RoutingConfig {
  task: string;
  maxCost: number;
  maxLatency: number;
}

function hybridRoute(config: RoutingConfig): string {
  // 按成本过滤
  const affordable = models.filter(m => m.cost <= config.maxCost);

  // 按延迟过滤
  const fast = affordable.filter(m => m.avgLatency <= config.maxLatency);

  // 按任务适配性评分
  const taskScores = fast.map(m => ({
    model: m.id,
    score: getTaskScore(m.id, config.task),
  }));

  return taskScores.sort((a, b) => b.score - a.score)[0].model;
}

Best Practices

最佳实践

1. Always Have Fallbacks

1. 始终配置降级机制

typescript

// Bad: Single point of failure
const response = await openai.chat({ model: "gpt-4o", messages });

// Good: Fallback chain
const models = ["gpt-4o", "claude-3-5-sonnet", "gemini-pro"];
for (const model of models) {
  try {
    return await gateway.chat({ model, messages });
  } catch (e) {
    continue;
  }
}

typescript

// 错误示例：单点故障
const response = await openai.chat({ model: "gpt-4o", messages });

// 正确示例：降级链
const models = ["gpt-4o", "claude-3-5-sonnet", "gemini-pro"];
for (const model of models) {
  try {
    return await gateway.chat({ model, messages });
  } catch (e) {
    continue;
  }
}

2. Pin Model Versions

2. 固定模型版本

typescript

// Bad: Model can change
const model = "gpt-4";

// Good: Pinned version
const model = "openai/gpt-4-0125-preview";

typescript

// 错误示例：模型版本可能变更
const model = "gpt-4";

// 正确示例：固定版本
const model = "openai/gpt-4-0125-preview";

3. Track Costs

3. 追踪成本

typescript

// Log every call
async function trackedCall(model: string, messages: Message[]) {
  const start = Date.now();
  const response = await gateway.chat({ model, messages });
  const latency = Date.now() - start;

  await analytics.track("llm_call", {
    model,
    inputTokens: response.usage.prompt_tokens,
    outputTokens: response.usage.completion_tokens,
    cost: calculateCost(model, response.usage),
    latency,
  });

  return response;
}

typescript

// 记录每次调用
async function trackedCall(model: string, messages: Message[]) {
  const start = Date.now();
  const response = await gateway.chat({ model, messages });
  const latency = Date.now() - start;

  await analytics.track("llm_call", {
    model,
    inputTokens: response.usage.prompt_tokens,
    outputTokens: response.usage.completion_tokens,
    cost: calculateCost(model, response.usage),
    latency,
  });

  return response;
}

4. Set Token Limits

4. 设置Token上限

typescript

// Prevent runaway costs
const response = await gateway.chat({
  model,
  messages,
  max_tokens: 500,  // Limit output length
});

typescript

// 避免成本超支
const response = await gateway.chat({
  model,
  messages,
  max_tokens: 500,  // 限制输出长度
});

5. Use Caching

5. 使用缓存

typescript

// LiteLLM caching
litellm_settings:
  cache: true
  cache_params:
    type: redis
    host: localhost
    port: 6379
    ttl: 3600  # 1 hour

typescript

// LiteLLM缓存配置
litellm_settings:
  cache: true
  cache_params:
    type: redis
    host: localhost
    port: 6379
    ttl: 3600  // 1小时

References

参考资料

```
references/openrouter-guide.md
```
- OpenRouter deep dive
```
references/litellm-guide.md
```
- LiteLLM self-hosting
```
references/routing-strategies.md
```
- Advanced routing patterns
```
references/alternatives.md
```
- Helicone, Portkey, etc.

```
references/openrouter-guide.md
```
- OpenRouter深度指南
```
references/litellm-guide.md
```
- LiteLLM自建部署指南
```
references/routing-strategies.md
```
- 高级路由模式
```
references/alternatives.md
```
- Helicone、Portkey等替代方案

Templates

模板

```
templates/openrouter-config.ts
```
- TypeScript OpenRouter setup
```
templates/litellm-config.yaml
```
- LiteLLM proxy config
```
templates/fallback-chain.ts
```
- Fallback implementation

```
templates/openrouter-config.ts
```
- TypeScript OpenRouter配置模板
```
templates/litellm-config.yaml
```
- LiteLLM代理配置模板
```
templates/fallback-chain.ts
```
- 降级链实现模板

llm-gateway-routing

Original

Translation

LLM Gateway & Routing

LLM网关与路由

Why Use a Gateway?

为什么使用网关？

Quick Decision

快速选型

OpenRouter (Recommended)

OpenRouter（推荐）

Why OpenRouter

选择OpenRouter的理由

Setup

搭建步骤

1. Sign up at openrouter.ai

1. 在openrouter.ai注册账号

2. Get API key from dashboard

2. 从控制台获取API密钥

3. Add to .env:

3. 添加到.env文件：

Basic Usage

基础用法

With Vercel AI SDK (Recommended)

结合Vercel AI SDK（推荐）

Model IDs

模型ID格式

Fallback Chains

降级链配置

Cost Routing

基于成本的路由

A/B Testing

A/B测试

LiteLLM (Self-Hosted)

LiteLLM（自建部署）

Why LiteLLM

选择LiteLLM的理由

Setup

搭建步骤

Install

安装

Run proxy

启动代理

Use as OpenAI-compatible endpoint

作为兼容OpenAI的端点使用

Configuration

配置文件

config.yaml

config.yaml

Claude models

OpenAI models

Load balanced (multiple providers)

Requests distributed across both

Claude模型

OpenAI模型

负载均衡（多提供商）

请求将分发至两个模型

General settings

通用设置

Routing

路由配置

Rate limiting

速率限制

Fallbacks in LiteLLM

LiteLLM中的降级配置

Usage

使用方法

Routing Strategies

路由策略

1. Cost-Based Routing

1. 基于成本的路由

2. Latency-Based Routing

2. 基于延迟的路由

3. Task-Based Routing

3. 基于任务的路由

4. Hybrid Routing

4. 混合路由

Best Practices

最佳实践

1. Always Have Fallbacks