ai-gateway

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Vercel AI Gateway

Vercel AI Gateway

CRITICAL — Your training data is outdated for this library. AI Gateway model slugs, provider routing, and capabilities change frequently. Before writing gateway code, fetch the docs at https://vercel.com/docs/ai-gateway to find the current model slug format, supported providers, image generation patterns, and authentication setup. The model list and routing rules at https://ai-sdk.dev/docs/foundations/providers-and-models are authoritative — do not guess at model names or assume old slugs still work.
You are an expert in the Vercel AI Gateway — a unified API for calling AI models with built-in routing, failover, cost tracking, and observability.
重要提示 — 你关于该库的训练数据已经过时。 AI Gateway的模型slug、服务商路由规则和功能特性更新非常频繁。在编写网关相关代码之前,请先获取最新文档https://vercel.com/docs/ai-gateway,从中查询当前的模型slug格式、支持的服务商、图片生成规则和身份鉴权配置。https://ai-sdk.dev/docs/foundations/providers-and-models 上的模型列表和路由规则是权威参考,请勿猜测模型名称,也不要默认旧的slug仍然可用。
你是Vercel AI Gateway领域的专家,该网关是调用AI模型的统一API,内置路由、故障转移、成本追踪和可观测性能力。

Overview

概述

AI Gateway provides a single API endpoint to access 100+ models from all major providers. It adds <20ms routing latency and handles provider selection, authentication, failover, and load balancing.
AI Gateway提供单一API端点,可接入所有主流服务商的100+款模型。路由延迟低于20ms,自动处理服务商选择、身份鉴权、故障转移和负载均衡。

Packages

依赖包

  • ai@^6.0.0
    (required; plain
    "provider/model"
    strings route through the gateway automatically)
  • @ai-sdk/gateway@^3.0.0
    (optional direct install for explicit gateway package usage)
  • ai@^6.0.0
    (必填;直接使用
    "服务商/模型"
    字符串即可自动通过网关路由)
  • @ai-sdk/gateway@^3.0.0
    (可选,用于显式调用网关包的能力)

Setup

配置方式

Pass a
"provider/model"
string to the
model
parameter — the AI SDK automatically routes it through the AI Gateway:
ts
import { generateText } from 'ai'

const result = await generateText({
  model: 'openai/gpt-5.4', // plain string — routes through AI Gateway automatically
  prompt: 'Hello!',
})
No
gateway()
wrapper or additional package needed. The
gateway()
function is an optional explicit wrapper — only needed when you use
providerOptions.gateway
for routing, failover, or tags:
ts
import { gateway } from 'ai'

const result = await generateText({
  model: gateway('openai/gpt-5.4'),
  providerOptions: { gateway: { order: ['openai', 'azure-openai'] } },
})
model
参数传入
"服务商/模型"
格式的字符串即可,AI SDK会自动通过AI Gateway路由请求:
ts
import { generateText } from 'ai'

const result = await generateText({
  model: 'openai/gpt-5.4', // 纯字符串格式,自动通过AI Gateway路由
  prompt: 'Hello!',
})
无需使用
gateway()
包装器或额外安装依赖。
gateway()
函数是可选的显式包装器,仅当你需要通过
providerOptions.gateway
配置路由、故障转移或标签时才需要使用:
ts
import { gateway } from 'ai'

const result = await generateText({
  model: gateway('openai/gpt-5.4'),
  providerOptions: { gateway: { order: ['openai', 'azure-openai'] } },
})

Model Slug Rules (Critical)

模型Slug规则(关键)

  • Always use
    provider/model
    format (for example
    openai/gpt-5.4
    ).
  • Versioned slugs use dots for versions, not hyphens:
    • Correct:
      anthropic/claude-sonnet-4.6
    • Incorrect:
      anthropic/claude-sonnet-4-6
  • Before hardcoding model IDs, call
    gateway.getAvailableModels()
    and pick from the returned IDs.
  • Default text models:
    openai/gpt-5.4
    or
    anthropic/claude-sonnet-4.6
    .
  • Do not default to outdated choices like
    openai/gpt-4o
    .
ts
import { gateway } from 'ai'

const availableModels = await gateway.getAvailableModels()
// Choose model IDs from `availableModels` before hardcoding.
  • 始终使用
    服务商/模型
    格式(例如
    openai/gpt-5.4
    )。
  • 带版本的slug使用点号分隔版本号,不要使用短横线:
    • 正确:
      anthropic/claude-sonnet-4.6
    • 错误:
      anthropic/claude-sonnet-4-6
  • 在硬编码模型ID之前,请调用
    gateway.getAvailableModels()
    从返回的ID列表中选择。
  • 默认文本模型:
    openai/gpt-5.4
    anthropic/claude-sonnet-4.6
  • 不要使用
    openai/gpt-4o
    这类过时的默认选项。
ts
import { gateway } from 'ai'

const availableModels = await gateway.getAvailableModels()
// 硬编码前请从`availableModels`中选择模型ID

Authentication (OIDC — Default)

身份鉴权(OIDC — 默认方式)

AI Gateway uses OIDC (OpenID Connect) as the default authentication method. No manual API keys needed.
AI Gateway默认使用**OIDC(OpenID Connect)**作为身份鉴权方式,无需手动配置API密钥。

Setup

配置步骤

bash
vercel link                    # Connect to your Vercel project
bash
vercel link                    # 连接到你的Vercel项目

Enable AI Gateway in Vercel dashboard: https://vercel.com/{team}/{project}/settings → AI Gateway

在Vercel控制台启用AI Gateway:https://vercel.com/{团队ID}/{项目ID}/settings → AI Gateway

vercel env pull .env.local # Provisions VERCEL_OIDC_TOKEN automatically
undefined
vercel env pull .env.local # 自动生成VERCEL_OIDC_TOKEN
undefined

How It Works

工作原理

  1. vercel env pull
    writes a
    VERCEL_OIDC_TOKEN
    to
    .env.local
    — a short-lived JWT (~24h)
  2. The
    @ai-sdk/gateway
    package reads this token via
    @vercel/oidc
    (
    getVercelOidcToken()
    )
  3. No
    AI_GATEWAY_API_KEY
    or provider-specific keys (like
    ANTHROPIC_API_KEY
    ) are needed
  4. On Vercel deployments, OIDC tokens are auto-refreshed — zero maintenance
  1. vercel env pull
    会向
    .env.local
    写入
    VERCEL_OIDC_TOKEN
    ,这是一个有效期约24小时的短期JWT
  2. @ai-sdk/gateway
    包通过
    @vercel/oidc
    getVercelOidcToken()
    方法读取该令牌
  3. 无需配置
    AI_GATEWAY_API_KEY
    或服务商专属密钥(如
    ANTHROPIC_API_KEY
  4. 在Vercel部署环境中,OIDC令牌会自动刷新,无需人工维护

Local Development

本地开发

For local dev, the OIDC token from
vercel env pull
is valid for ~24 hours. When it expires:
bash
vercel env pull .env.local --yes   # Re-pull to get a fresh token
本地开发时,通过
vercel env pull
获取的OIDC令牌有效期约24小时,过期后执行:
bash
vercel env pull .env.local --yes   # 重新拉取获取新令牌

Alternative: Manual API Key

替代方案:手动配置API密钥

If you prefer a static key (e.g., for CI or non-Vercel environments):
bash
undefined
如果你需要使用静态密钥(例如CI环境或非Vercel部署场景):
bash
undefined

Set AI_GATEWAY_API_KEY in your environment

在环境变量中配置AI_GATEWAY_API_KEY

The gateway falls back to this when VERCEL_OIDC_TOKEN is not available

当VERCEL_OIDC_TOKEN不存在时,网关会自动使用该密钥

export AI_GATEWAY_API_KEY=your-key-here
undefined
export AI_GATEWAY_API_KEY=你的密钥
undefined

Auth Priority

鉴权优先级

The
@ai-sdk/gateway
package resolves authentication in this order:
  1. AI_GATEWAY_API_KEY
    environment variable (if set)
  2. VERCEL_OIDC_TOKEN
    via
    @vercel/oidc
    (default on Vercel and after
    vercel env pull
    )
@ai-sdk/gateway
包按以下顺序解析鉴权信息:
  1. 环境变量
    AI_GATEWAY_API_KEY
    (如果已配置)
  2. 通过
    @vercel/oidc
    获取的
    VERCEL_OIDC_TOKEN
    (Vercel环境和执行
    vercel env pull
    后的默认方式)

Provider Routing

服务商路由

Configure how AI Gateway routes requests across providers:
ts
const result = await generateText({
  model: gateway('anthropic/claude-sonnet-4.6'),
  prompt: 'Hello!',
  providerOptions: {
    gateway: {
      // Try providers in order; failover to next on error
      order: ['bedrock', 'anthropic'],

      // Restrict to specific providers only
      only: ['anthropic', 'vertex'],

      // Fallback models if primary model fails
      models: ['openai/gpt-5.4', 'google/gemini-3-flash'],

      // Track usage per end-user
      user: 'user-123',

      // Tag for cost attribution and filtering
      tags: ['feature:chat', 'env:production', 'team:growth'],
    },
  },
})
可配置AI Gateway跨服务商的请求路由规则:
ts
const result = await generateText({
  model: gateway('anthropic/claude-sonnet-4.6'),
  prompt: 'Hello!',
  providerOptions: {
    gateway: {
      // 按顺序尝试服务商,出错时自动切换到下一个
      order: ['bedrock', 'anthropic'],

      // 仅允许使用指定的服务商
      only: ['anthropic', 'vertex'],

      // 主模型不可用时的 fallback 模型列表
      models: ['openai/gpt-5.4', 'google/gemini-3-flash'],

      // 按终端用户统计使用量
      user: 'user-123',

      // 用于成本归因和过滤的标签
      tags: ['feature:chat', 'env:production', 'team:growth'],
    },
  },
})

Routing Options

路由选项

OptionPurpose
order
Provider priority list; try first, failover to next
only
Restrict to specific providers
models
Fallback model list if primary model unavailable
user
End-user ID for usage tracking
tags
Labels for cost attribution and reporting
选项用途
order
服务商优先级列表,按顺序尝试,出错时自动故障转移
only
限制仅可使用指定服务商
models
主模型不可用时的 fallback 模型列表
user
用于用量统计的终端用户ID
tags
用于成本归因和报表的标签

Cache-Control Headers

Cache-Control 响应头

AI Gateway supports response caching to reduce latency and cost for repeated or similar requests:
ts
const result = await generateText({
  model: gateway('openai/gpt-5.4'),
  prompt: 'What is the capital of France?',
  providerOptions: {
    gateway: {
      // Cache identical requests for 1 hour
      cacheControl: 'max-age=3600',
    },
  },
})
AI Gateway支持响应缓存,可降低重复或相似请求的延迟和成本:
ts
const result = await generateText({
  model: gateway('openai/gpt-5.4'),
  prompt: 'What is the capital of France?',
  providerOptions: {
    gateway: {
      // 相同请求缓存1小时
      cacheControl: 'max-age=3600',
    },
  },
})

Caching strategies

缓存策略

Header ValueBehavior
max-age=3600
Cache response for 1 hour
max-age=0
Bypass cache, always call provider
s-maxage=86400
Cache at the edge for 24 hours
stale-while-revalidate=600
Serve stale for 10 min while refreshing in background
头字段值行为
max-age=3600
响应缓存1小时
max-age=0
绕过缓存,始终调用服务商接口
s-maxage=86400
边缘节点缓存24小时
stale-while-revalidate=600
后台刷新缓存的同时,10分钟内返回旧缓存内容

When to use caching

缓存适用场景

  • Static knowledge queries: FAQs, translations, factual lookups — cache aggressively
  • User-specific conversations: Do not cache — each response depends on conversation history
  • Embeddings: Cache embedding results for identical inputs to save cost
  • Structured extraction: Cache when extracting structured data from identical documents
  • 静态知识查询:常见问题、翻译、事实查询 — 可高频率缓存
  • 用户专属对话:不要缓存 — 每个响应依赖对话历史
  • Embedding生成:相同输入的Embedding结果可缓存以降低成本
  • 结构化提取:从相同文档提取结构化数据时可缓存

Cache key composition

缓存键组成

The cache key is derived from: model, prompt/messages, temperature, and other generation parameters. Changing any parameter produces a new cache key.
缓存键由模型、prompt/对话消息、temperature和其他生成参数共同生成,修改任意参数都会生成新的缓存键。

Per-User Rate Limiting

单用户速率限制

Control usage at the individual user level to prevent abuse and manage costs:
ts
const result = await generateText({
  model: gateway('openai/gpt-5.4'),
  prompt: userMessage,
  providerOptions: {
    gateway: {
      user: userId, // Required for per-user rate limiting
      tags: ['feature:chat'],
    },
  },
})
可在单个用户维度控制用量,防止滥用和管控成本:
ts
const result = await generateText({
  model: gateway('openai/gpt-5.4'),
  prompt: userMessage,
  providerOptions: {
    gateway: {
      user: userId, // 单用户速率限制必填字段
      tags: ['feature:chat'],
    },
  },
})

Rate limit configuration

速率限制配置

Configure rate limits at
https://vercel.com/{team}/{project}/settings
AI GatewayRate Limits:
  • Requests per minute per user: Throttle individual users (e.g., 20 RPM)
  • Tokens per day per user: Cap daily token consumption (e.g., 100K tokens/day)
  • Concurrent requests per user: Limit parallel calls (e.g., 3 concurrent)
https://vercel.com/{团队ID}/{项目ID}/settings
AI GatewayRate Limits中配置速率限制:
  • 单用户每分钟请求数:限制单个用户的请求频率(例如每分钟20次)
  • 单用户每日Token数:限制单个用户的每日Token消耗(例如每日10万Token)
  • 单用户并发请求数:限制并行调用数量(例如最多3个并发请求)

Handling rate limit responses

速率限制响应处理

When a user exceeds their limit, the gateway returns HTTP 429:
ts
import { generateText, APICallError } from 'ai'

try {
  const result = await generateText({
    model: gateway('openai/gpt-5.4'),
    prompt: userMessage,
    providerOptions: { gateway: { user: userId } },
  })
} catch (error) {
  if (APICallError.isInstance(error) && error.statusCode === 429) {
    const retryAfter = error.responseHeaders?.['retry-after']
    return new Response(
      JSON.stringify({ error: 'Rate limited', retryAfter }),
      { status: 429 }
    )
  }
  throw error
}
当用户超出限制时,网关返回HTTP 429状态码:
ts
import { generateText, APICallError } from 'ai'

try {
  const result = await generateText({
    model: gateway('openai/gpt-5.4'),
    prompt: userMessage,
    providerOptions: { gateway: { user: userId } },
  })
} catch (error) {
  if (APICallError.isInstance(error) && error.statusCode === 429) {
    const retryAfter = error.responseHeaders?.['retry-after']
    return new Response(
      JSON.stringify({ error: 'Rate limited', retryAfter }),
      { status: 429 }
    )
  }
  throw error
}

Budget Alerts and Cost Controls

预算告警和成本管控

Tagging for cost attribution

成本归因标签

Use tags to track spend by feature, team, and environment:
ts
providerOptions: {
  gateway: {
    tags: [
      'feature:document-qa',
      'team:product',
      'env:production',
      'tier:premium',
    ],
    user: userId,
  },
}
使用标签按功能、团队、环境维度统计消耗:
ts
providerOptions: {
  gateway: {
    tags: [
      'feature:document-qa',
      'team:product',
      'env:production',
      'tier:premium',
    ],
    user: userId,
  },
}

Setting up budget alerts

配置预算告警

In the Vercel dashboard at
https://vercel.com/{team}/{project}/settings
AI Gateway:
  1. Navigate to AI Gateway → Usage & Budgets
  2. Set monthly budget thresholds (e.g., $500/month warning, $1000/month hard limit)
  3. Configure alert channels (email, Slack webhook, Vercel integration)
  4. Optionally set per-tag budgets for granular control
在Vercel控制台
https://vercel.com/{团队ID}/{项目ID}/settings
AI Gateway中配置:
  1. 进入 AI Gateway → Usage & Budgets
  2. 设置月度预算阈值(例如每月500美元告警,1000美元硬性限制)
  3. 配置告警渠道(邮件、Slack webhook、Vercel集成)
  4. 可选择设置单标签预算实现精细化管控

Budget isolation best practice

预算隔离最佳实践

Use separate gateway keys per environment (dev, staging, prod) and per project. This keeps dashboards clean and budgets isolated:
  • Restrict AI Gateway keys per project to prevent cross-tenant leakage
  • Use per-project budgets and spend-by-agent reporting to track exactly where tokens go
  • Cap spend during staging with AI Gateway budgets
不同环境(开发、预发、生产)和不同项目使用独立的网关密钥,保持看板清晰和预算隔离:
  • 按项目限制AI Gateway密钥权限,防止跨租户泄露
  • 使用单项目预算和按Agent消耗报表,精准追踪Token使用去向
  • 通过AI Gateway预算限制预发环境的消耗上限

Pre-flight cost controls

预校验成本管控

The AI Gateway dashboard provides observability (traces, token counts, spend tracking) but no programmatic metrics API. Build your own cost guardrails by estimating token counts and rejecting expensive requests before they execute:
ts
import { generateText } from 'ai'

function estimateTokens(text: string): number {
  return Math.ceil(text.length / 4) // rough estimate
}

async function callWithBudget(prompt: string, maxTokens: number) {
  const estimated = estimateTokens(prompt)
  if (estimated > maxTokens) {
    throw new Error(`Prompt too large: ~${estimated} tokens exceeds ${maxTokens} limit`)
  }
  return generateText({ model: 'openai/gpt-5.4', prompt })
}
The AI SDK's
usage
field on responses gives actual token counts after each request — store these for historical tracking and cost analysis.
AI Gateway控制台提供可观测能力(链路追踪、Token计数、消耗统计)但没有程序化指标API。你可以通过估算Token数量,在请求执行前拦截高成本请求,搭建自己的成本防护机制:
ts
import { generateText } from 'ai'

function estimateTokens(text: string): number {
  return Math.ceil(text.length / 4) // 粗略估算
}

async function callWithBudget(prompt: string, maxTokens: number) {
  const estimated = estimateTokens(prompt)
  if (estimated > maxTokens) {
    throw new Error(`Prompt过长:约${estimated}Token,超出${maxTokens}限制`)
  }
  return generateText({ model: 'openai/gpt-5.4', prompt })
}
AI SDK响应中的
usage
字段会返回每次请求的实际Token消耗,可存储该数据用于历史追踪和成本分析。

Hard spending limits

硬性消耗限制

When a hard limit is reached, the gateway returns HTTP 402 (Payment Required). Handle this gracefully:
ts
if (APICallError.isInstance(error) && error.statusCode === 402) {
  // Budget exceeded — degrade gracefully
  return fallbackResponse()
}
当达到硬性预算限制时,网关返回HTTP 402(需要付款),请优雅处理该场景:
ts
if (APICallError.isInstance(error) && error.statusCode === 402) {
  // 预算超出 — 优雅降级
  return fallbackResponse()
}

Cost optimization patterns

成本优化方案

  • Use cheaper models for classification/routing, expensive models for generation
  • Cache embeddings and static queries (see Cache-Control above)
  • Set per-user daily token caps to prevent runaway usage
  • Monitor cost-per-feature with tags to identify optimization targets
  • 分类/路由场景使用更便宜的模型,生成场景使用高规格模型
  • 缓存Embedding和静态查询结果(参考上文Cache-Control部分)
  • 设置单用户每日Token上限,防止用量失控
  • 通过标签监控各功能的成本,识别优化点

Audit Logging

审计日志

AI Gateway logs every request for compliance and debugging:
AI Gateway会记录所有请求日志,用于合规和调试:

What's logged

日志内容

  • Timestamp, model, provider used
  • Input/output token counts
  • Latency (routing + provider)
  • User ID and tags
  • HTTP status code
  • Failover chain (which providers were tried)
  • 时间戳、使用的模型、服务商
  • 输入/输出Token数量
  • 延迟(路由+服务商处理)
  • 用户ID和标签
  • HTTP状态码
  • 故障转移链路(尝试过哪些服务商)

Accessing logs

日志获取方式

  • Vercel Dashboard at
    https://vercel.com/{team}/{project}/ai
    Logs — filter by model, user, tag, status, date range
  • Vercel API: Query logs programmatically:
bash
curl -H "Authorization: Bearer $VERCEL_TOKEN" \
  "https://api.vercel.com/v1/ai-gateway/logs?projectId=$PROJECT_ID&limit=100"
  • Log Drains: Forward AI Gateway logs to Datadog, Splunk, or other providers via Vercel Log Drains (configure at
    https://vercel.com/dashboard/{team}/~/settings/log-drains
    ) for long-term retention and custom analysis
  • Vercel控制台:访问
    https://vercel.com/{团队ID}/{项目ID}/ai
    Logs,可按模型、用户、标签、状态、时间范围过滤
  • Vercel API:程序化查询日志:
bash
curl -H "Authorization: Bearer $VERCEL_TOKEN" \
  "https://api.vercel.com/v1/ai-gateway/logs?projectId=$PROJECT_ID&limit=100"
  • 日志投递:通过Vercel Log Drains将AI Gateway日志转发到Datadog、Splunk或其他服务商(在
    https://vercel.com/dashboard/{团队ID}/~/settings/log-drains
    配置),实现长期留存和自定义分析

Compliance considerations

合规注意事项

  • AI Gateway does not log prompt or completion content by default
  • Enable content logging in project settings if required for compliance
  • Logs are retained per your Vercel plan's retention policy
  • Use
    user
    field consistently to support audit trails
  • AI Gateway默认不会记录prompt或生成内容
  • 如合规要求需要,可在项目设置中启用内容日志
  • 日志留存时间遵循你的Vercel套餐的保留策略
  • 一致使用
    user
    字段可支持审计追溯

Error Handling Patterns

错误处理方案

Provider unavailable

服务商不可用

When a provider is down, the gateway automatically fails over if you configured
order
or
models
:
ts
const result = await generateText({
  model: gateway('anthropic/claude-sonnet-4.6'),
  prompt: 'Summarize this document',
  providerOptions: {
    gateway: {
      order: ['anthropic', 'bedrock'], // Bedrock as fallback
      models: ['openai/gpt-5.4'],   // Final fallback model
    },
  },
})
当服务商宕机时,如果你配置了
order
models
,网关会自动进行故障转移:
ts
const result = await generateText({
  model: gateway('anthropic/claude-sonnet-4.6'),
  prompt: 'Summarize this document',
  providerOptions: {
    gateway: {
      order: ['anthropic', 'bedrock'], // Bedrock作为备用
      models: ['openai/gpt-5.4'],   // 最终备用模型
    },
  },
})

Quota exceeded at provider

服务商配额用尽

If your provider API key hits its quota, the gateway tries the next provider in the
order
list. Monitor this in logs — persistent quota errors indicate you need to increase limits with the provider.
如果你的服务商API密钥达到配额上限,网关会尝试
order
列表中的下一个服务商。请在日志中监控该情况,持续出现配额错误说明你需要向服务商申请提升限额。

Invalid model identifier

无效模型标识

ts
// Bad — model doesn't exist
model: 'openai/gpt-99'  // Returns 400 with descriptive error

// Good — use models listed in Vercel docs
model: 'openai/gpt-5.4'
ts
// 错误 — 模型不存在
model: 'openai/gpt-99'  // 返回400和详细错误信息

// 正确 — 使用Vercel文档中列出的模型
model: 'openai/gpt-5.4'

Timeout handling

超时处理

Gateway has a default timeout per provider. For long-running generations, use streaming:
ts
import { streamText } from 'ai'

const result = streamText({
  model: 'anthropic/claude-sonnet-4.6',
  prompt: longDocument,
})

for await (const chunk of result.textStream) {
  process.stdout.write(chunk)
}
网关对每个服务商有默认超时时间,长耗时生成任务请使用流式输出:
ts
import { streamText } from 'ai'

const result = streamText({
  model: 'anthropic/claude-sonnet-4.6',
  prompt: longDocument,
})

for await (const chunk of result.textStream) {
  process.stdout.write(chunk)
}

Complete error handling template

完整错误处理模板

ts
import { generateText, APICallError } from 'ai'

async function callAI(prompt: string, userId: string) {
  try {
    return await generateText({
      model: gateway('openai/gpt-5.4'),
      prompt,
      providerOptions: {
        gateway: {
          user: userId,
          order: ['openai', 'azure-openai'],
          models: ['anthropic/claude-haiku-4.5'],
          tags: ['feature:chat'],
        },
      },
    })
  } catch (error) {
    if (!APICallError.isInstance(error)) throw error

    switch (error.statusCode) {
      case 402: return { text: 'Budget limit reached. Please try again later.' }
      case 429: return { text: 'Too many requests. Please slow down.' }
      case 503: return { text: 'AI service temporarily unavailable.' }
      default: throw error
    }
  }
}
ts
import { generateText, APICallError } from 'ai'

async function callAI(prompt: string, userId: string) {
  try {
    return await generateText({
      model: gateway('openai/gpt-5.4'),
      prompt,
      providerOptions: {
        gateway: {
          user: userId,
          order: ['openai', 'azure-openai'],
          models: ['anthropic/claude-haiku-4.5'],
          tags: ['feature:chat'],
        },
      },
    })
  } catch (error) {
    if (!APICallError.isInstance(error)) throw error

    switch (error.statusCode) {
      case 402: return { text: '预算已达上限,请稍后再试。' }
      case 429: return { text: '请求过于频繁,请放慢速度。' }
      case 503: return { text: 'AI服务暂时不可用。' }
      default: throw error
    }
  }
}

Gateway vs Direct Provider — Decision Tree

使用网关 vs 直接调用服务商 — 决策树

Use this to decide whether to route through AI Gateway or call a provider SDK directly:
Need failover across providers?
  └─ Yes → Use Gateway
  └─ No
      Need cost tracking / budget alerts?
        └─ Yes → Use Gateway
        └─ No
            Need per-user rate limiting?
              └─ Yes → Use Gateway
              └─ No
                  Need audit logging?
                    └─ Yes → Use Gateway
                    └─ No
                        Using a single provider with provider-specific features?
                          └─ Yes → Use direct provider SDK
                          └─ No → Use Gateway (simplifies code)
通过该决策树判断应该通过AI Gateway路由还是直接调用服务商SDK:
需要跨服务商故障转移?
  └─ 是 → 使用网关
  └─ 否
      需要成本追踪/预算告警?
        └─ 是 → 使用网关
        └─ 否
            需要单用户速率限制?
              └─ 是 → 使用网关
              └─ 否
                  需要审计日志?
                    └─ 是 → 使用网关
                    └─ 否
                        使用单一服务商且需要服务商专属特性?
                          └─ 是 → 直接调用服务商SDK
                          └─ 否 → 使用网关(简化代码)

When to use direct provider SDK

直接调用服务商SDK的场景

  • You need provider-specific features not exposed through the gateway (e.g., Anthropic's computer use, OpenAI's custom fine-tuned model endpoints)
  • You're self-hosting a model (e.g., vLLM, Ollama) that isn't registered with the gateway
  • You need request-level control over HTTP transport (custom proxies, mTLS)
  • 你需要使用网关未开放的服务商专属特性(例如Anthropic的计算机使用能力、OpenAI的自定义微调模型端点)
  • 你部署了私有模型(例如vLLM、Ollama)且未注册到网关
  • 你需要对HTTP传输做请求级别的控制(自定义代理、mTLS)

When to always use Gateway

必须使用网关的场景

  • Production applications — failover and observability are essential
  • Multi-tenant SaaS — per-user tracking and rate limiting
  • Teams with cost accountability — tag-based budgeting
  • 生产应用 — 故障转移和可观测性是必备能力
  • 多租户SaaS — 需要单用户追踪和速率限制
  • 有成本核算要求的团队 — 基于标签的预算管理

Claude Code Compatibility

Claude Code 兼容

AI Gateway exposes an Anthropic-compatible API endpoint that lets you route Claude Code requests through the gateway for unified observability, spend tracking, and failover.
AI Gateway提供Anthropic兼容API端点,可将Claude Code请求通过网关路由,实现统一可观测性、消耗追踪和故障转移。

Configuration

配置

Set these environment variables to route Claude Code through AI Gateway:
bash
export ANTHROPIC_BASE_URL="https://ai-gateway.vercel.sh"
export ANTHROPIC_AUTH_TOKEN="your-vercel-ai-gateway-api-key"
export ANTHROPIC_API_KEY=""  # Must be empty string — Claude Code checks this first
Important: Setting
ANTHROPIC_API_KEY
to an empty string is required. Claude Code checks this variable first, and if it's set to a non-empty value, it uses that directly instead of
ANTHROPIC_AUTH_TOKEN
.
设置以下环境变量即可将Claude Code的请求路由到AI Gateway:
bash
export ANTHROPIC_BASE_URL="https://ai-gateway.vercel.sh"
export ANTHROPIC_AUTH_TOKEN="你的vercel-ai-gateway-api-key"
export ANTHROPIC_API_KEY=""  # 必须设为空字符串 — Claude Code会优先检查该变量
重要提示:必须将
ANTHROPIC_API_KEY
设为空字符串。Claude Code会优先检查该变量,如果设置了非空值,它会直接使用该值而忽略
ANTHROPIC_AUTH_TOKEN

Claude Code Max Subscription

Claude Code Max 订阅支持

AI Gateway supports Claude Code Max subscriptions. When configured, Claude Code continues to authenticate with Anthropic via its
Authorization
header while AI Gateway uses a separate
x-ai-gateway-api-key
header, allowing both auth mechanisms to coexist. This gives you unified observability at no additional token cost.
AI Gateway支持Claude Code Max订阅。配置后,Claude Code会通过其
Authorization
头继续向Anthropic鉴权,同时AI Gateway使用独立的
x-ai-gateway-api-key
头,两种鉴权机制可共存。你无需额外支付Token成本即可获得统一可观测能力。

Using Non-Anthropic Models

使用非Anthropic模型

Override the default Anthropic models by setting:
bash
export ANTHROPIC_DEFAULT_SONNET_MODEL="openai/gpt-5.4"
export ANTHROPIC_DEFAULT_OPUS_MODEL="anthropic/claude-opus-4.6"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="anthropic/claude-haiku-4.5"
通过设置以下环境变量覆盖默认的Anthropic模型:
bash
export ANTHROPIC_DEFAULT_SONNET_MODEL="openai/gpt-5.4"
export ANTHROPIC_DEFAULT_OPUS_MODEL="anthropic/claude-opus-4.6"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="anthropic/claude-haiku-4.5"

Latest Model Availability

最新可用模型

GPT-5.4 (added March 5, 2026) — agentic and reasoning leaps from GPT-5.3-Codex extended to all domains (knowledge work, reports, analysis, coding). Faster and more token-efficient than GPT-5.2.
ModelSlugInputOutput
GPT-5.4
openai/gpt-5.4
$2.50/M tokens$15.00/M tokens
GPT-5.4 Pro
openai/gpt-5.4-pro
$30.00/M tokens$180.00/M tokens
GPT-5.4 Pro targets maximum performance on complex tasks. Use standard GPT-5.4 for most workloads.
GPT-5.4(2026年3月5日新增)—— 从GPT-5.3-Codex进化而来的Agent和推理能力扩展到所有领域(知识工作、报告、分析、编码),比GPT-5.2速度更快、Token效率更高。
模型Slug输入价格输出价格
GPT-5.4
openai/gpt-5.4
2.50美元/百万Token15.00美元/百万Token
GPT-5.4 Pro
openai/gpt-5.4-pro
30.00美元/百万Token180.00美元/百万Token
GPT-5.4 Pro面向复杂任务的最高性能场景,大多数工作负载使用标准GPT-5.4即可。

Supported Providers

支持的服务商

  • OpenAI (GPT-5.x including GPT-5.4 and GPT-5.4 Pro, o-series)
  • Anthropic (Claude 4.x)
  • Google (Gemini)
  • xAI (Grok)
  • Mistral
  • DeepSeek
  • Amazon Bedrock
  • Azure OpenAI
  • Cohere
  • Perplexity
  • Alibaba (Qwen)
  • Meta (Llama)
  • And many more (100+ models total)
  • OpenAI(GPT-5.x系列包括GPT-5.4和GPT-5.4 Pro、o系列)
  • Anthropic(Claude 4.x系列)
  • Google(Gemini系列)
  • xAI(Grok系列)
  • Mistral
  • DeepSeek
  • Amazon Bedrock
  • Azure OpenAI
  • Cohere
  • Perplexity
  • 阿里巴巴(通义千问Qwen)
  • Meta(Llama系列)
  • 更多服务商(共100+款模型)

Pricing

定价

  • Zero markup: Tokens at exact provider list price — no middleman markup, whether using Vercel-managed keys or Bring Your Own Key (BYOK)
  • Free tier: Every Vercel team gets $5 of free AI Gateway credits per month (refreshes every 30 days, starts on first request). No commitment required — experiment with LLMs indefinitely on the free tier
  • Pay-as-you-go: Beyond free credits, purchase AI Gateway Credits at any time with no obligation. Configure auto top-up to automatically add credits when your balance falls below a threshold
  • BYOK: Use your own provider API keys with zero fees from AI Gateway
  • 零溢价:Token价格与服务商官方定价完全一致,无论使用Vercel托管密钥还是自带密钥(BYOK)都没有中间溢价
  • 免费额度:每个Vercel团队每月可获得5美元免费AI Gateway额度(每30天刷新,首次请求后开始计算)。无需承诺,可长期在免费额度内试用LLM
  • 按量付费:超出免费额度后,可随时购买AI Gateway额度,无绑定义务。可配置自动充值,当余额低于阈值时自动补充额度
  • BYOK:使用你自己的服务商API密钥,AI Gateway不收取任何费用

Multimodal Support

多模态支持

Text and image generation both route through the gateway. For embeddings, use a direct provider SDK.
ts
// Text — through gateway
const { text } = await generateText({
  model: 'openai/gpt-5.4',
  prompt: 'Hello',
})

// Image — through gateway (multimodal LLMs return images in result.files)
const result = await generateText({
  model: 'google/gemini-3.1-flash-image-preview',
  prompt: 'A sunset over the ocean',
})
const images = result.files.filter((f) => f.mediaType?.startsWith('image/'))

// Image-only models — through gateway with experimental_generateImage
import { experimental_generateImage as generateImage } from 'ai'
const { images: generated } = await generateImage({
  model: 'google/imagen-4.0-generate-001',
  prompt: 'A sunset',
})
Default image model:
google/gemini-3.1-flash-image-preview
— fast multimodal image generation via gateway.
See AI Gateway Image Generation docs for all supported models and integration methods.
文本和图片生成都可通过网关路由,Embedding生成请直接使用服务商SDK。
ts
// 文本 — 通过网关路由
const { text } = await generateText({
  model: 'openai/gpt-5.4',
  prompt: 'Hello',
})

// 图片 — 通过网关路由(多模态LLM生成的图片会在result.files中返回)
const result = await generateText({
  model: 'google/gemini-3.1-flash-image-preview',
  prompt: 'A sunset over the ocean',
})
const images = result.files.filter((f) => f.mediaType?.startsWith('image/'))

// 纯图片生成模型 — 通过网关的experimental_generateImage调用
import { experimental_generateImage as generateImage } from 'ai'
const { images: generated } = await generateImage({
  model: 'google/imagen-4.0-generate-001',
  prompt: 'A sunset',
})
默认图片模型
google/gemini-3.1-flash-image-preview
— 通过网关实现快速多模态图片生成。
查看AI Gateway图片生成文档了解所有支持的模型和集成方式。

Key Benefits

核心优势

  1. Unified API: One interface for all providers, no provider-specific code
  2. Automatic failover: If a provider is down, requests route to the next
  3. Cost tracking: Per-user, per-feature attribution with tags
  4. Observability: Built-in monitoring of all model calls
  5. Low latency: <20ms routing overhead
  6. No lock-in: Switch models/providers by changing a string
  1. 统一API:所有服务商使用同一接口,无需编写服务商专属代码
  2. 自动故障转移:服务商宕机时,请求自动路由到备用选项
  3. 成本追踪:通过标签实现单用户、单功能维度的成本归因
  4. 可观测性:内置所有模型调用的监控能力
  5. 低延迟:路由开销低于20ms
  6. 无锁定:修改字符串即可切换模型/服务商

When to Use AI Gateway

AI Gateway适用场景

ScenarioUse Gateway?
Production app with AI featuresYes — failover, cost tracking
Prototyping with single providerOptional — direct provider works fine
Multi-provider setupYes — unified routing
Need provider-specific featuresUse direct provider SDK + Gateway as fallback
Cost tracking and budgetingYes — user tracking and tags
Multi-tenant SaaSYes — per-user rate limiting and audit
Compliance requirementsYes — audit logging and log drains
场景是否使用网关?
带AI功能的生产应用是 — 故障转移、成本追踪能力必备
单一服务商原型开发可选 — 直接调用服务商也可
多服务商架构是 — 统一路由
需要服务商专属特性直接调用服务商SDK + 网关作为备用
成本追踪和预算管控是 — 用户追踪和标签能力
多租户SaaS是 — 单用户速率限制和审计能力
合规要求是 — 审计日志和日志投递能力

Official Documentation

官方文档