llm-gateway-routing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLLM Gateway & Routing
LLM网关与路由
Configure multi-model access, fallbacks, cost optimization, and A/B testing.
配置多模型访问、降级策略、成本优化与A/B测试。
Why Use a Gateway?
为什么使用网关?
Without gateway:
- Vendor lock-in (one provider)
- No fallbacks (provider down = app down)
- Hard to A/B test models
- Scattered API keys and configs
With gateway:
- Single API for 400+ models
- Automatic fallbacks
- Easy model switching
- Unified cost tracking
无网关时:
- 供应商锁定(仅能使用单一提供商)
- 无降级方案(提供商故障则应用瘫痪)
- 难以对模型进行A/B测试
- API密钥与配置分散管理
使用网关后:
- 单一API对接400+模型
- 自动降级机制
- 模型切换便捷
- 统一成本追踪
Quick Decision
快速选型
| Need | Solution |
|---|---|
| Fastest setup, multi-model | OpenRouter |
| Full control, self-hosted | LiteLLM |
| Observability + routing | Helicone |
| Enterprise, guardrails | Portkey |
| 需求 | 解决方案 |
|---|---|
| 最快搭建、多模型支持 | OpenRouter |
| 完全可控、自建部署 | LiteLLM |
| 可观测性 + 路由 | Helicone |
| 企业级、安全防护 | Portkey |
OpenRouter (Recommended)
OpenRouter(推荐)
Why OpenRouter
选择OpenRouter的理由
- 400+ models: OpenAI, Anthropic, Google, Meta, Mistral, and more
- Single API: One key for all providers
- Automatic fallbacks: Built-in reliability
- A/B testing: Easy model comparison
- Cost tracking: Unified billing dashboard
- Free credits: $1 free to start
- 400+模型支持:涵盖OpenAI、Anthropic、Google、Meta、Mistral等提供商的模型
- 单一API:一个密钥对接所有提供商
- 自动降级:内置可靠性保障
- A/B测试:轻松对比模型效果
- 成本追踪:统一账单仪表盘
- 免费额度:注册即享1美元免费额度
Setup
搭建步骤
bash
undefinedbash
undefined1. Sign up at openrouter.ai
1. 在openrouter.ai注册账号
2. Get API key from dashboard
2. 从控制台获取API密钥
3. Add to .env:
3. 添加到.env文件:
OPENROUTER_API_KEY=sk-or-v1-...
undefinedOPENROUTER_API_KEY=sk-or-v1-...
undefinedBasic Usage
基础用法
typescript
// Using fetch
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.OPENROUTER_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-3-5-sonnet',
messages: [{ role: 'user', content: 'Hello!' }],
}),
});typescript
// 使用fetch调用
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.OPENROUTER_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-3-5-sonnet',
messages: [{ role: 'user', content: 'Hello!' }],
}),
});With Vercel AI SDK (Recommended)
结合Vercel AI SDK(推荐)
typescript
import { createOpenAI } from "@ai-sdk/openai";
import { generateText } from "ai";
const openrouter = createOpenAI({
baseURL: "https://openrouter.ai/api/v1",
apiKey: process.env.OPENROUTER_API_KEY,
});
const { text } = await generateText({
model: openrouter("anthropic/claude-3-5-sonnet"),
prompt: "Explain quantum computing",
});typescript
import { createOpenAI } from "@ai-sdk/openai";
import { generateText } from "ai";
const openrouter = createOpenAI({
baseURL: "https://openrouter.ai/api/v1",
apiKey: process.env.OPENROUTER_API_KEY,
});
const { text } = await generateText({
model: openrouter("anthropic/claude-3-5-sonnet"),
prompt: "Explain quantum computing",
});Model IDs
模型ID格式
typescript
// Format: provider/model-name
const models = {
// Anthropic
claude35Sonnet: "anthropic/claude-3-5-sonnet",
claudeHaiku: "anthropic/claude-3-5-haiku",
// OpenAI
gpt4o: "openai/gpt-4o",
gpt4oMini: "openai/gpt-4o-mini",
// Google
geminiPro: "google/gemini-pro-1.5",
geminiFlash: "google/gemini-flash-1.5",
// Meta
llama3: "meta-llama/llama-3.1-70b-instruct",
// Auto (OpenRouter picks best)
auto: "openrouter/auto",
};typescript
// 格式:提供商/模型名称
const models = {
// Anthropic
claude35Sonnet: "anthropic/claude-3-5-sonnet",
claudeHaiku: "anthropic/claude-3-5-haiku",
// OpenAI
gpt4o: "openai/gpt-4o",
gpt4oMini: "openai/gpt-4o-mini",
// Google
geminiPro: "google/gemini-pro-1.5",
geminiFlash: "google/gemini-flash-1.5",
// Meta
llama3: "meta-llama/llama-3.1-70b-instruct",
// 自动选择(OpenRouter挑选最优模型)
auto: "openrouter/auto",
};Fallback Chains
降级链配置
typescript
// Define fallback order
const modelChain = [
"anthropic/claude-3-5-sonnet", // Primary
"openai/gpt-4o", // Fallback 1
"google/gemini-pro-1.5", // Fallback 2
];
async function callWithFallback(messages: Message[]) {
for (const model of modelChain) {
try {
return await openrouter.chat({ model, messages });
} catch (error) {
console.log(`${model} failed, trying next...`);
}
}
throw new Error("All models failed");
}typescript
// 定义降级顺序
const modelChain = [
"anthropic/claude-3-5-sonnet", // 主模型
"openai/gpt-4o", // 第一降级模型
"google/gemini-pro-1.5", // 第二降级模型
];
async function callWithFallback(messages: Message[]) {
for (const model of modelChain) {
try {
return await openrouter.chat({ model, messages });
} catch (error) {
console.log(`${model}调用失败,尝试下一个模型...`);
}
}
throw new Error("所有模型调用失败");
}Cost Routing
基于成本的路由
typescript
// Route based on query complexity
function selectModel(query: string): string {
const complexity = analyzeComplexity(query);
if (complexity === "simple") {
// Simple queries → cheap model
return "openai/gpt-4o-mini"; // ~$0.15/1M tokens
} else if (complexity === "medium") {
// Medium → balanced
return "google/gemini-flash-1.5"; // ~$0.075/1M tokens
} else {
// Complex → best quality
return "anthropic/claude-3-5-sonnet"; // ~$3/1M tokens
}
}
function analyzeComplexity(query: string): "simple" | "medium" | "complex" {
// Simple heuristics
if (query.length < 50) return "simple";
if (query.includes("explain") || query.includes("analyze")) return "complex";
return "medium";
}typescript
// 根据查询复杂度选择模型
function selectModel(query: string): string {
const complexity = analyzeComplexity(query);
if (complexity === "simple") {
// 简单查询 → 低成本模型
return "openai/gpt-4o-mini"; // ~0.15美元/百万tokens
} else if (complexity === "medium") {
// 中等复杂度 → 平衡成本与效果
return "google/gemini-flash-1.5"; // ~0.075美元/百万tokens
} else {
// 复杂查询 → 最优质量
return "anthropic/claude-3-5-sonnet"; // ~3美元/百万tokens
}
}
function analyzeComplexity(query: string): "simple" | "medium" | "complex" {
// 简单判断规则
if (query.length < 50) return "simple";
if (query.includes("explain") || query.includes("analyze")) return "complex";
return "medium";
}A/B Testing
A/B测试
typescript
// Random assignment
function getModel(userId: string): string {
const hash = userId.charCodeAt(0) % 100;
if (hash < 50) {
return "anthropic/claude-3-5-sonnet"; // 50%
} else {
return "openai/gpt-4o"; // 50%
}
}
// Track which model was used
const model = getModel(userId);
const response = await openrouter.chat({ model, messages });
await analytics.track("llm_call", { model, userId, latency, cost });typescript
// 随机分配模型
function getModel(userId: string): string {
const hash = userId.charCodeAt(0) % 100;
if (hash < 50) {
return "anthropic/claude-3-5-sonnet"; // 50%流量
} else {
return "openai/gpt-4o"; // 50%流量
}
}
// 追踪使用的模型
const model = getModel(userId);
const response = await openrouter.chat({ model, messages });
await analytics.track("llm_call", { model, userId, latency, cost });LiteLLM (Self-Hosted)
LiteLLM(自建部署)
Why LiteLLM
选择LiteLLM的理由
- Self-hosted: Full control over data
- 100+ providers: Same coverage as OpenRouter
- Load balancing: Distribute across providers
- Cost tracking: Built-in spend management
- Caching: Redis or in-memory
- Rate limiting: Per-user limits
- 自建部署:完全控制数据
- 100+提供商支持:与OpenRouter覆盖范围一致
- 负载均衡:请求分发至多个提供商
- 成本追踪:内置支出管理
- 缓存机制:支持Redis或内存缓存
- 速率限制:按用户设置调用上限
Setup
搭建步骤
bash
undefinedbash
undefinedInstall
安装
pip install litellm[proxy]
pip install litellm[proxy]
Run proxy
启动代理
litellm --config config.yaml
litellm --config config.yaml
Use as OpenAI-compatible endpoint
作为兼容OpenAI的端点使用
export OPENAI_API_BASE=http://localhost:4000
undefinedexport OPENAI_API_BASE=http://localhost:4000
undefinedConfiguration
配置文件
yaml
undefinedyaml
undefinedconfig.yaml
config.yaml
model_list:
Claude models
- model_name: claude-sonnet litellm_params: model: anthropic/claude-3-5-sonnet-latest api_key: sk-ant-...
OpenAI models
- model_name: gpt-4o litellm_params: model: openai/gpt-4o api_key: sk-...
Load balanced (multiple providers)
- model_name: balanced
litellm_params:
model: anthropic/claude-3-5-sonnet-latest
litellm_params:
model: openai/gpt-4o
Requests distributed across both
model_list:
Claude模型
- model_name: claude-sonnet litellm_params: model: anthropic/claude-3-5-sonnet-latest api_key: sk-ant-...
OpenAI模型
- model_name: gpt-4o litellm_params: model: openai/gpt-4o api_key: sk-...
负载均衡(多提供商)
- model_name: balanced
litellm_params:
model: anthropic/claude-3-5-sonnet-latest
litellm_params:
model: openai/gpt-4o
请求将分发至两个模型
General settings
通用设置
general_settings:
master_key: sk-master-...
database_url: postgresql://...
general_settings:
master_key: sk-master-...
database_url: postgresql://...
Routing
路由配置
router_settings:
routing_strategy: simple-shuffle # or latency-based-routing
num_retries: 3
timeout: 30
router_settings:
routing_strategy: simple-shuffle # 或 latency-based-routing(基于延迟的路由)
num_retries: 3
timeout: 30
Rate limiting
速率限制
litellm_settings:
max_budget: 100 # $100/month
budget_duration: monthly
undefinedlitellm_settings:
max_budget: 100 # 每月100美元
budget_duration: monthly
undefinedFallbacks in LiteLLM
LiteLLM中的降级配置
yaml
model_list:
- model_name: primary
litellm_params:
model: anthropic/claude-3-5-sonnet-latest
fallbacks:
- model_name: fallback-1
litellm_params:
model: openai/gpt-4o
- model_name: fallback-2
litellm_params:
model: google/gemini-proyaml
model_list:
- model_name: primary
litellm_params:
model: anthropic/claude-3-5-sonnet-latest
fallbacks:
- model_name: fallback-1
litellm_params:
model: openai/gpt-4o
- model_name: fallback-2
litellm_params:
model: google/gemini-proUsage
使用方法
typescript
// Use like OpenAI SDK
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:4000",
apiKey: "sk-master-...",
});
const response = await client.chat.completions.create({
model: "claude-sonnet", // Maps to configured model
messages: [{ role: "user", content: "Hello!" }],
});typescript
// 兼容OpenAI SDK调用
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:4000",
apiKey: "sk-master-...",
});
const response = await client.chat.completions.create({
model: "claude-sonnet", // 映射至配置文件中的模型
messages: [{ role: "user", content: "Hello!" }],
});Routing Strategies
路由策略
1. Cost-Based Routing
1. 基于成本的路由
typescript
const costTiers = {
cheap: ["openai/gpt-4o-mini", "google/gemini-flash-1.5"],
balanced: ["anthropic/claude-3-5-haiku", "openai/gpt-4o"],
premium: ["anthropic/claude-3-5-sonnet", "openai/o1-preview"],
};
function routeByCost(budget: "cheap" | "balanced" | "premium"): string {
const models = costTiers[budget];
return models[Math.floor(Math.random() * models.length)];
}typescript
const costTiers = {
cheap: ["openai/gpt-4o-mini", "google/gemini-flash-1.5"],
balanced: ["anthropic/claude-3-5-haiku", "openai/gpt-4o"],
premium: ["anthropic/claude-3-5-sonnet", "openai/o1-preview"],
};
function routeByCost(budget: "cheap" | "balanced" | "premium"): string {
const models = costTiers[budget];
return models[Math.floor(Math.random() * models.length)];
}2. Latency-Based Routing
2. 基于延迟的路由
typescript
// Track latency per model
const latencyStats: Record<string, number[]> = {};
function routeByLatency(): string {
const avgLatencies = Object.entries(latencyStats)
.map(([model, times]) => ({
model,
avg: times.reduce((a, b) => a + b, 0) / times.length,
}))
.sort((a, b) => a.avg - b.avg);
return avgLatencies[0].model;
}
// Update after each call
function recordLatency(model: string, latencyMs: number) {
if (!latencyStats[model]) latencyStats[model] = [];
latencyStats[model].push(latencyMs);
// Keep last 100 samples
if (latencyStats[model].length > 100) {
latencyStats[model].shift();
}
}typescript
// 追踪每个模型的延迟
const latencyStats: Record<string, number[]> = {};
function routeByLatency(): string {
const avgLatencies = Object.entries(latencyStats)
.map(([model, times]) => ({
model,
avg: times.reduce((a, b) => a + b, 0) / times.length,
}))
.sort((a, b) => a.avg - b.avg);
return avgLatencies[0].model;
}
// 每次调用后更新延迟数据
function recordLatency(model: string, latencyMs: number) {
if (!latencyStats[model]) latencyStats[model] = [];
latencyStats[model].push(latencyMs);
// 仅保留最近100条记录
if (latencyStats[model].length > 100) {
latencyStats[model].shift();
}
}3. Task-Based Routing
3. 基于任务的路由
typescript
const taskModels = {
coding: "anthropic/claude-3-5-sonnet", // Best for code
reasoning: "openai/o1-preview", // Best for logic
creative: "anthropic/claude-3-5-sonnet", // Best for writing
simple: "openai/gpt-4o-mini", // Cheap and fast
multimodal: "google/gemini-pro-1.5", // Vision + text
};
function routeByTask(task: keyof typeof taskModels): string {
return taskModels[task];
}typescript
const taskModels = {
coding: "anthropic/claude-3-5-sonnet", // 代码任务最优
reasoning: "openai/o1-preview", // 逻辑推理最优
creative: "anthropic/claude-3-5-sonnet", // 创作任务最优
simple: "openai/gpt-4o-mini", // 低成本快速响应
multimodal: "google/gemini-pro-1.5", // 多模态(视觉+文本)
};
function routeByTask(task: keyof typeof taskModels): string {
return taskModels[task];
}4. Hybrid Routing
4. 混合路由
typescript
interface RoutingConfig {
task: string;
maxCost: number;
maxLatency: number;
}
function hybridRoute(config: RoutingConfig): string {
// Filter by cost
const affordable = models.filter(m => m.cost <= config.maxCost);
// Filter by latency
const fast = affordable.filter(m => m.avgLatency <= config.maxLatency);
// Select best for task
const taskScores = fast.map(m => ({
model: m.id,
score: getTaskScore(m.id, config.task),
}));
return taskScores.sort((a, b) => b.score - a.score)[0].model;
}typescript
interface RoutingConfig {
task: string;
maxCost: number;
maxLatency: number;
}
function hybridRoute(config: RoutingConfig): string {
// 按成本过滤
const affordable = models.filter(m => m.cost <= config.maxCost);
// 按延迟过滤
const fast = affordable.filter(m => m.avgLatency <= config.maxLatency);
// 按任务适配性评分
const taskScores = fast.map(m => ({
model: m.id,
score: getTaskScore(m.id, config.task),
}));
return taskScores.sort((a, b) => b.score - a.score)[0].model;
}Best Practices
最佳实践
1. Always Have Fallbacks
1. 始终配置降级机制
typescript
// Bad: Single point of failure
const response = await openai.chat({ model: "gpt-4o", messages });
// Good: Fallback chain
const models = ["gpt-4o", "claude-3-5-sonnet", "gemini-pro"];
for (const model of models) {
try {
return await gateway.chat({ model, messages });
} catch (e) {
continue;
}
}typescript
// 错误示例:单点故障
const response = await openai.chat({ model: "gpt-4o", messages });
// 正确示例:降级链
const models = ["gpt-4o", "claude-3-5-sonnet", "gemini-pro"];
for (const model of models) {
try {
return await gateway.chat({ model, messages });
} catch (e) {
continue;
}
}2. Pin Model Versions
2. 固定模型版本
typescript
// Bad: Model can change
const model = "gpt-4";
// Good: Pinned version
const model = "openai/gpt-4-0125-preview";typescript
// 错误示例:模型版本可能变更
const model = "gpt-4";
// 正确示例:固定版本
const model = "openai/gpt-4-0125-preview";3. Track Costs
3. 追踪成本
typescript
// Log every call
async function trackedCall(model: string, messages: Message[]) {
const start = Date.now();
const response = await gateway.chat({ model, messages });
const latency = Date.now() - start;
await analytics.track("llm_call", {
model,
inputTokens: response.usage.prompt_tokens,
outputTokens: response.usage.completion_tokens,
cost: calculateCost(model, response.usage),
latency,
});
return response;
}typescript
// 记录每次调用
async function trackedCall(model: string, messages: Message[]) {
const start = Date.now();
const response = await gateway.chat({ model, messages });
const latency = Date.now() - start;
await analytics.track("llm_call", {
model,
inputTokens: response.usage.prompt_tokens,
outputTokens: response.usage.completion_tokens,
cost: calculateCost(model, response.usage),
latency,
});
return response;
}4. Set Token Limits
4. 设置Token上限
typescript
// Prevent runaway costs
const response = await gateway.chat({
model,
messages,
max_tokens: 500, // Limit output length
});typescript
// 避免成本超支
const response = await gateway.chat({
model,
messages,
max_tokens: 500, // 限制输出长度
});5. Use Caching
5. 使用缓存
typescript
// LiteLLM caching
litellm_settings:
cache: true
cache_params:
type: redis
host: localhost
port: 6379
ttl: 3600 # 1 hourtypescript
// LiteLLM缓存配置
litellm_settings:
cache: true
cache_params:
type: redis
host: localhost
port: 6379
ttl: 3600 // 1小时References
参考资料
- - OpenRouter deep dive
references/openrouter-guide.md - - LiteLLM self-hosting
references/litellm-guide.md - - Advanced routing patterns
references/routing-strategies.md - - Helicone, Portkey, etc.
references/alternatives.md
- - OpenRouter深度指南
references/openrouter-guide.md - - LiteLLM自建部署指南
references/litellm-guide.md - - 高级路由模式
references/routing-strategies.md - - Helicone、Portkey等替代方案
references/alternatives.md
Templates
模板
- - TypeScript OpenRouter setup
templates/openrouter-config.ts - - LiteLLM proxy config
templates/litellm-config.yaml - - Fallback implementation
templates/fallback-chain.ts
- - TypeScript OpenRouter配置模板
templates/openrouter-config.ts - - LiteLLM代理配置模板
templates/litellm-config.yaml - - 降级链实现模板
templates/fallback-chain.ts