cloudflare-workers-ai

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Cloudflare Workers AI - Complete Reference

Cloudflare Workers AI 完整参考指南

Production-ready knowledge domain for building AI-powered applications with Cloudflare Workers AI.
Status: Production Ready ✅ Last Updated: 2025-10-21 Dependencies: cloudflare-worker-base (for Worker setup) Latest Versions: wrangler@4.43.0, @cloudflare/workers-types@4.20251014.0

用于基于Cloudflare Workers AI构建AI驱动应用的生产级知识库。
状态:生产就绪 ✅ 最后更新:2025-10-21 依赖项:cloudflare-worker-base(用于Worker设置) 最新版本:wrangler@4.43.0, @cloudflare/workers-types@4.20251014.0

Table of Contents

目录

Quick Start (5 minutes)

快速入门(5分钟)

1. Add AI Binding

1. 添加AI绑定

wrangler.jsonc:
jsonc
{
  "ai": {
    "binding": "AI"
  }
}
wrangler.jsonc:
jsonc
{
  "ai": {
    "binding": "AI"
  }
}

2. Run Your First Model

2. 运行你的第一个模型

typescript
export interface Env {
  AI: Ai;
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
      prompt: 'What is Cloudflare?',
    });

    return Response.json(response);
  },
};
typescript
export interface Env {
  AI: Ai;
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
      prompt: 'What is Cloudflare?',
    });

    return Response.json(response);
  },
};

3. Add Streaming (Recommended)

3. 添加流式传输(推荐)

typescript
const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
  messages: [{ role: 'user', content: 'Tell me a story' }],
  stream: true, // Always use streaming for text generation!
});

return new Response(stream, {
  headers: { 'content-type': 'text/event-stream' },
});
Why streaming?
  • Prevents buffering large responses in memory
  • Faster time-to-first-token
  • Better user experience for long-form content
  • Avoids Worker timeout issues

typescript
const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
  messages: [{ role: 'user', content: 'Tell me a story' }],
  stream: true, // Always use streaming for text generation!
});

return new Response(stream, {
  headers: { 'content-type': 'text/event-stream' },
});
为什么使用流式传输?
  • 避免在内存中缓冲大响应
  • 缩短首令牌生成时间
  • 提升长文本内容的用户体验
  • 避免Worker超时问题

Workers AI API Reference

Workers AI API参考

env.AI.run()

env.AI.run()

Run an AI model inference.
Signature:
typescript
async env.AI.run(
  model: string,
  inputs: ModelInputs,
  options?: { gateway?: { id: string; skipCache?: boolean } }
): Promise<ModelOutput | ReadableStream>
Parameters:
  • model
    (string, required) - Model ID (e.g.,
    @cf/meta/llama-3.1-8b-instruct
    )
  • inputs
    (object, required) - Model-specific inputs
  • options
    (object, optional) - Additional options
    • gateway
      (object) - AI Gateway configuration
      • id
        (string) - Gateway ID
      • skipCache
        (boolean) - Skip AI Gateway cache
Returns:
  • Non-streaming:
    Promise<ModelOutput>
    - JSON response
  • Streaming:
    ReadableStream
    - Server-sent events stream

运行AI模型推理。
签名:
typescript
async env.AI.run(
  model: string,
  inputs: ModelInputs,
  options?: { gateway?: { id: string; skipCache?: boolean } }
): Promise<ModelOutput | ReadableStream>
参数:
  • model
    (字符串,必填)- 模型ID(例如:
    @cf/meta/llama-3.1-8b-instruct
  • inputs
    (对象,必填)- 模型特定输入
  • options
    (对象,可选)- 附加选项
    • gateway
      (对象)- AI Gateway配置
      • id
        (字符串)- 网关ID
      • skipCache
        (布尔值)- 跳过AI Gateway缓存
返回值:
  • 非流式:
    Promise<ModelOutput>
    - JSON响应
  • 流式:
    ReadableStream
    - 服务器发送事件流

Text Generation Models

文本生成模型

Input Format:
typescript
{
  messages?: Array<{ role: 'system' | 'user' | 'assistant'; content: string }>;
  prompt?: string; // Deprecated, use messages
  stream?: boolean; // Default: false
  max_tokens?: number; // Max tokens to generate
  temperature?: number; // 0.0-1.0, default varies by model
  top_p?: number; // 0.0-1.0
  top_k?: number;
}
Output Format (Non-Streaming):
typescript
{
  response: string; // Generated text
}
Example:
typescript
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is TypeScript?' },
  ],
  stream: false,
});

console.log(response.response);

输入格式:
typescript
{
  messages?: Array<{ role: 'system' | 'user' | 'assistant'; content: string }>;
  prompt?: string; // Deprecated, use messages
  stream?: boolean; // Default: false
  max_tokens?: number; // Max tokens to generate
  temperature?: number; // 0.0-1.0, default varies by model
  top_p?: number; // 0.0-1.0
  top_k?: number;
}
输出格式(非流式):
typescript
{
  response: string; // Generated text
}
示例:
typescript
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is TypeScript?' },
  ],
  stream: false,
});

console.log(response.response);

Text Embeddings Models

文本嵌入向量模型

Input Format:
typescript
{
  text: string | string[]; // Single text or array of texts
}
Output Format:
typescript
{
  shape: number[]; // [batch_size, embedding_dimensions]
  data: number[][]; // Array of embedding vectors
}
Example:
typescript
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
  text: ['Hello world', 'Cloudflare Workers'],
});

console.log(embeddings.shape); // [2, 768]
console.log(embeddings.data[0]); // [0.123, -0.456, ...]

输入格式:
typescript
{
  text: string | string[]; // Single text or array of texts
}
输出格式:
typescript
{
  shape: number[]; // [batch_size, embedding_dimensions]
  data: number[][]; // Array of embedding vectors
}
示例:
typescript
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
  text: ['Hello world', 'Cloudflare Workers'],
});

console.log(embeddings.shape); // [2, 768]
console.log(embeddings.data[0]); // [0.123, -0.456, ...]

Image Generation Models

图像生成模型

Input Format:
typescript
{
  prompt: string; // Text description
  num_steps?: number; // Default: 20
  guidance?: number; // CFG scale, default: 7.5
  strength?: number; // For img2img, default: 1.0
  image?: number[][]; // For img2img (base64 or array)
}
Output Format:
  • Binary image data (PNG/JPEG)
Example:
typescript
const imageStream = await env.AI.run('@cf/black-forest-labs/flux-1-schnell', {
  prompt: 'A beautiful sunset over mountains',
});

return new Response(imageStream, {
  headers: { 'content-type': 'image/png' },
});

输入格式:
typescript
{
  prompt: string; // Text description
  num_steps?: number; // Default: 20
  guidance?: number; // CFG scale, default: 7.5
  strength?: number; // For img2img, default: 1.0
  image?: number[][]; // For img2img (base64 or array)
}
输出格式:
  • 二进制图像数据(PNG/JPEG)
示例:
typescript
const imageStream = await env.AI.run('@cf/black-forest-labs/flux-1-schnell', {
  prompt: 'A beautiful sunset over mountains',
});

return new Response(imageStream, {
  headers: { 'content-type': 'image/png' },
});

Vision Models

视觉模型

Input Format:
typescript
{
  messages: Array<{
    role: 'user' | 'assistant';
    content: Array<{ type: 'text' | 'image_url'; text?: string; image_url?: { url: string } }>;
  }>;
}
Example:
typescript
const response = await env.AI.run('@cf/meta/llama-3.2-11b-vision-instruct', {
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'What is in this image?' },
        { type: 'image_url', image_url: { url: 'data:image/png;base64,iVBOR...' } },
      ],
    },
  ],
});

输入格式:
typescript
{
  messages: Array<{
    role: 'user' | 'assistant';
    content: Array<{ type: 'text' | 'image_url'; text?: string; image_url?: { url: string } }>;
  }>;
}
示例:
typescript
const response = await env.AI.run('@cf/meta/llama-3.2-11b-vision-instruct', {
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'What is in this image?' },
        { type: 'image_url', image_url: { url: 'data:image/png;base64,iVBOR...' } },
      ],
    },
  ],
});

Model Selection Guide

模型选择指南

Text Generation (LLMs)

文本生成(LLMs)

ModelBest ForRate LimitSize
@cf/meta/llama-3.1-8b-instruct
General purpose, fast300/min8B
@cf/meta/llama-3.2-1b-instruct
Ultra-fast, simple tasks300/min1B
@cf/qwen/qwen1.5-14b-chat-awq
High quality, complex reasoning150/min14B
@cf/deepseek-ai/deepseek-r1-distill-qwen-32b
Coding, technical content300/min32B
@hf/thebloke/mistral-7b-instruct-v0.1-awq
Fast, efficient400/min7B
模型适用场景速率限制规模
@cf/meta/llama-3.1-8b-instruct
通用场景、速度快300/分钟8B
@cf/meta/llama-3.2-1b-instruct
超快速、简单任务300/分钟1B
@cf/qwen/qwen1.5-14b-chat-awq
高质量、复杂推理150/分钟14B
@cf/deepseek-ai/deepseek-r1-distill-qwen-32b
代码生成、技术内容300/分钟32B
@hf/thebloke/mistral-7b-instruct-v0.1-awq
快速、高效400/分钟7B

Text Embeddings

文本嵌入向量

ModelDimensionsBest ForRate Limit
@cf/baai/bge-base-en-v1.5
768General purpose RAG3000/min
@cf/baai/bge-large-en-v1.5
1024High accuracy search1500/min
@cf/baai/bge-small-en-v1.5
384Fast, low storage3000/min
模型维度适用场景速率限制
@cf/baai/bge-base-en-v1.5
768通用RAG场景3000/分钟
@cf/baai/bge-large-en-v1.5
1024高精度搜索1500/分钟
@cf/baai/bge-small-en-v1.5
384快速、低存储占用3000/分钟

Image Generation

图像生成

ModelBest ForRate LimitSpeed
@cf/black-forest-labs/flux-1-schnell
High quality, photorealistic720/minFast
@cf/stabilityai/stable-diffusion-xl-base-1.0
General purpose720/minMedium
@cf/lykon/dreamshaper-8-lcm
Artistic, stylized720/minFast
模型适用场景速率限制速度
@cf/black-forest-labs/flux-1-schnell
高质量、照片级真实感720/分钟
@cf/stabilityai/stable-diffusion-xl-base-1.0
通用场景720/分钟中等
@cf/lykon/dreamshaper-8-lcm
艺术风格、风格化创作720/分钟

Vision Models

视觉模型

ModelBest ForRate Limit
@cf/meta/llama-3.2-11b-vision-instruct
Image understanding720/min
@cf/unum/uform-gen2-qwen-500m
Fast image captioning720/min

模型适用场景速率限制
@cf/meta/llama-3.2-11b-vision-instruct
图像理解720/分钟
@cf/unum/uform-gen2-qwen-500m
快速图像 captioning720/分钟

Common Patterns

常见模式

Pattern 1: Chat Completion with History

模式1:带历史记录的聊天补全

typescript
app.post('/chat', async (c) => {
  const { messages } = await c.req.json<{
    messages: Array<{ role: string; content: string }>;
  }>();

  const response = await c.env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
    messages,
    stream: true,
  });

  return new Response(response, {
    headers: { 'content-type': 'text/event-stream' },
  });
});

typescript
app.post('/chat', async (c) => {
  const { messages } = await c.req.json<{
    messages: Array<{ role: string; content: string }>;
  }>();

  const response = await c.env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
    messages,
    stream: true,
  });

  return new Response(response, {
    headers: { 'content-type': 'text/event-stream' },
  });
});

Pattern 2: RAG (Retrieval Augmented Generation)

模式2:RAG(检索增强生成)

typescript
// Step 1: Generate embeddings
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
  text: [userQuery],
});

const vector = embeddings.data[0];

// Step 2: Search Vectorize
const matches = await env.VECTORIZE.query(vector, { topK: 3 });

// Step 3: Build context from matches
const context = matches.matches.map((m) => m.metadata.text).join('\n\n');

// Step 4: Generate response with context
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
  messages: [
    {
      role: 'system',
      content: `Answer using this context:\n${context}`,
    },
    { role: 'user', content: userQuery },
  ],
  stream: true,
});

return new Response(response, {
  headers: { 'content-type': 'text/event-stream' },
});

typescript
// Step 1: Generate embeddings
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
  text: [userQuery],
});

const vector = embeddings.data[0];

// Step 2: Search Vectorize
const matches = await env.VECTORIZE.query(vector, { topK: 3 });

// Step 3: Build context from matches
const context = matches.matches.map((m) => m.metadata.text).join('\n\n');

// Step 4: Generate response with context
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
  messages: [
    {
      role: 'system',
      content: `Answer using this context:\n${context}`,
    },
    { role: 'user', content: userQuery },
  ],
  stream: true,
});

return new Response(response, {
  headers: { 'content-type': 'text/event-stream' },
});

Pattern 3: Structured Output with Zod

模式3:使用Zod的结构化输出

typescript
import { z } from 'zod';

const RecipeSchema = z.object({
  name: z.string(),
  ingredients: z.array(z.string()),
  instructions: z.array(z.string()),
  prepTime: z.number(),
});

app.post('/recipe', async (c) => {
  const { dish } = await c.req.json<{ dish: string }>();

  const response = await c.env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
    messages: [
      {
        role: 'user',
        content: `Generate a recipe for ${dish}. Return ONLY valid JSON matching this schema: ${JSON.stringify(RecipeSchema.shape)}`,
      },
    ],
  });

  // Parse and validate
  const recipe = RecipeSchema.parse(JSON.parse(response.response));

  return c.json(recipe);
});

typescript
import { z } from 'zod';

const RecipeSchema = z.object({
  name: z.string(),
  ingredients: z.array(z.string()),
  instructions: z.array(z.string()),
  prepTime: z.number(),
});

app.post('/recipe', async (c) => {
  const { dish } = await c.req.json<{ dish: string }>();

  const response = await c.env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
    messages: [
      {
        role: 'user',
        content: `Generate a recipe for ${dish}. Return ONLY valid JSON matching this schema: ${JSON.stringify(RecipeSchema.shape)}`,
      },
    ],
  });

  // Parse and validate
  const recipe = RecipeSchema.parse(JSON.parse(response.response));

  return c.json(recipe);
});

Pattern 4: Image Generation + R2 Storage

模式4:图像生成 + R2存储

typescript
app.post('/generate-image', async (c) => {
  const { prompt } = await c.req.json<{ prompt: string }>();

  // Generate image
  const imageStream = await c.env.AI.run('@cf/black-forest-labs/flux-1-schnell', {
    prompt,
  });

  const imageBytes = await new Response(imageStream).bytes();

  // Store in R2
  const key = `images/${Date.now()}.png`;
  await c.env.BUCKET.put(key, imageBytes, {
    httpMetadata: { contentType: 'image/png' },
  });

  return c.json({
    success: true,
    url: `https://your-domain.com/${key}`,
  });
});

typescript
app.post('/generate-image', async (c) => {
  const { prompt } = await c.req.json<{ prompt: string }>();

  // Generate image
  const imageStream = await c.env.AI.run('@cf/black-forest-labs/flux-1-schnell', {
    prompt,
  });

  const imageBytes = await new Response(imageStream).bytes();

  // Store in R2
  const key = `images/${Date.now()}.png`;
  await c.env.BUCKET.put(key, imageBytes, {
    httpMetadata: { contentType: 'image/png' },
  });

  return c.json({
    success: true,
    url: `https://your-domain.com/${key}`,
  });
});

AI Gateway Integration

AI Gateway集成

AI Gateway provides caching, logging, and analytics for AI requests.
Setup:
typescript
const response = await env.AI.run(
  '@cf/meta/llama-3.1-8b-instruct',
  { prompt: 'Hello' },
  {
    gateway: {
      id: 'my-gateway', // Your gateway ID
      skipCache: false, // Use cache
    },
  }
);
Benefits:
  • Cost Tracking - Monitor neurons usage per request
  • Caching - Reduce duplicate inference costs
  • Logging - Debug and analyze AI requests
  • Rate Limiting - Additional layer of protection
  • Analytics - Request patterns and performance
Access Gateway Logs:
typescript
const gateway = env.AI.gateway('my-gateway');
const logId = env.AI.aiGatewayLogId;

// Send feedback
await gateway.patchLog(logId, {
  feedback: { rating: 1, comment: 'Great response' },
});

AI Gateway为AI请求提供缓存、日志和分析功能。
设置:
typescript
const response = await env.AI.run(
  '@cf/meta/llama-3.1-8b-instruct',
  { prompt: 'Hello' },
  {
    gateway: {
      id: 'my-gateway', // Your gateway ID
      skipCache: false, // Use cache
    },
  }
);
优势:
  • 成本追踪 - 监控每个请求的神经元使用量
  • 缓存 - 减少重复推理成本
  • 日志 - 调试和分析AI请求
  • 速率限制 - 额外的防护层
  • 分析 - 请求模式和性能统计
访问网关日志:
typescript
const gateway = env.AI.gateway('my-gateway');
const logId = env.AI.aiGatewayLogId;

// Send feedback
await gateway.patchLog(logId, {
  feedback: { rating: 1, comment: 'Great response' },
});

Rate Limits & Pricing

速率限制与定价

Rate Limits (per minute)

速率限制(每分钟)

Task TypeDefault LimitNotes
Text Generation300/minSome fast models: 400-1500/min
Text Embeddings3000/minBGE-large: 1500/min
Image Generation720/minAll image models
Vision Models720/minImage understanding
Translation720/minM2M100, Opus MT
Classification2000/minText classification
Speech Recognition720/minWhisper models
任务类型默认限制说明
文本生成300/分钟部分快速模型:400-1500/分钟
文本嵌入向量3000/分钟BGE-large:1500/分钟
图像生成720/分钟所有图像模型
视觉模型720/分钟图像理解
翻译720/分钟M2M100、Opus MT
分类2000/分钟文本分类
语音识别720/分钟Whisper模型

Pricing (Neurons-Based)

定价(基于神经元)

Free Tier:
  • 10,000 neurons per day
  • Resets daily at 00:00 UTC
Paid Tier:
  • $0.011 per 1,000 neurons
  • 10,000 neurons/day included
  • Unlimited usage above free allocation
Example Costs:
ModelInput (1M tokens)Output (1M tokens)
Llama 3.2 1B$0.027$0.201
Llama 3.1 8B$0.088$0.606
BGE-base embeddings$0.005N/A
Flux image generation~$0.011/imageN/A

免费层级:
  • 每日10,000个神经元
  • 每日00:00 UTC重置
付费层级:
  • 每1,000个神经元0.011美元
  • 包含每日10,000个神经元
  • 免费额度之外无使用限制
成本示例:
模型输入(100万令牌)输出(100万令牌)
Llama 3.2 1B$0.027$0.201
Llama 3.1 8B$0.088$0.606
BGE-base嵌入向量$0.005N/A
Flux图像生成~$0.011/张N/A

Production Checklist

生产环境检查清单

Before Deploying

部署前

  • Enable AI Gateway for cost tracking and logging
  • Implement streaming for all text generation endpoints
  • Add rate limit retry with exponential backoff
  • Validate input length to prevent token limit errors
  • Set appropriate timeouts (Workers: 30s CPU default, 5m max)
  • Monitor neurons usage in Cloudflare dashboard
  • Test error handling for model unavailable, rate limits
  • Add input sanitization to prevent prompt injection
  • Configure CORS if using from browser
  • Plan for scale - upgrade to Paid plan if needed
  • 启用AI Gateway 以进行成本追踪和日志记录
  • 实现流式传输 用于所有文本生成端点
  • 添加速率限制重试 并使用指数退避策略
  • 验证输入长度 以避免令牌超限错误
  • 设置适当的超时时间(Workers:默认CPU超时30秒,最大5分钟)
  • 监控神经元使用量 在Cloudflare控制台中
  • 测试错误处理 针对模型不可用、速率限制等情况
  • 添加输入清理 以防止提示注入
  • 配置CORS 若从浏览器调用
  • 规划扩容 - 如需升级到付费计划

Error Handling

错误处理

typescript
async function runAIWithRetry(
  env: Env,
  model: string,
  inputs: any,
  maxRetries = 3
): Promise<any> {
  let lastError: Error;

  for (let i = 0; i < maxRetries; i++) {
    try {
      return await env.AI.run(model, inputs);
    } catch (error) {
      lastError = error as Error;
      const message = lastError.message.toLowerCase();

      // Rate limit - retry with backoff
      if (message.includes('429') || message.includes('rate limit')) {
        const delay = Math.pow(2, i) * 1000; // Exponential backoff
        await new Promise((resolve) => setTimeout(resolve, delay));
        continue;
      }

      // Other errors - throw immediately
      throw error;
    }
  }

  throw lastError!;
}
typescript
async function runAIWithRetry(
  env: Env,
  model: string,
  inputs: any,
  maxRetries = 3
): Promise<any> {
  let lastError: Error;

  for (let i = 0; i < maxRetries; i++) {
    try {
      return await env.AI.run(model, inputs);
    } catch (error) {
      lastError = error as Error;
      const message = lastError.message.toLowerCase();

      // Rate limit - retry with backoff
      if (message.includes('429') || message.includes('rate limit')) {
        const delay = Math.pow(2, i) * 1000; // Exponential backoff
        await new Promise((resolve) => setTimeout(resolve, delay));
        continue;
      }

      // Other errors - throw immediately
      throw error;
    }
  }

  throw lastError!;
}

Monitoring

监控

typescript
app.use('*', async (c, next) => {
  const start = Date.now();

  await next();

  // Log AI usage
  console.log({
    path: c.req.path,
    duration: Date.now() - start,
    logId: c.env.AI.aiGatewayLogId,
  });
});

typescript
app.use('*', async (c, next) => {
  const start = Date.now();

  await next();

  // Log AI usage
  console.log({
    path: c.req.path,
    duration: Date.now() - start,
    logId: c.env.AI.aiGatewayLogId,
  });
});

OpenAI Compatibility

OpenAI兼容性

Workers AI supports OpenAI-compatible endpoints.
Using OpenAI SDK:
typescript
import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: env.CLOUDFLARE_API_KEY,
  baseURL: `https://api.cloudflare.com/client/v4/accounts/${env.CLOUDFLARE_ACCOUNT_ID}/ai/v1`,
});

// Chat completions
const completion = await openai.chat.completions.create({
  model: '@cf/meta/llama-3.1-8b-instruct',
  messages: [{ role: 'user', content: 'Hello!' }],
});

// Embeddings
const embeddings = await openai.embeddings.create({
  model: '@cf/baai/bge-base-en-v1.5',
  input: 'Hello world',
});
Endpoints:
  • /v1/chat/completions
    - Text generation
  • /v1/embeddings
    - Text embeddings

Workers AI支持OpenAI兼容的端点。
使用OpenAI SDK:
typescript
import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: env.CLOUDFLARE_API_KEY,
  baseURL: `https://api.cloudflare.com/client/v4/accounts/${env.CLOUDFLARE_ACCOUNT_ID}/ai/v1`,
});

// Chat completions
const completion = await openai.chat.completions.create({
  model: '@cf/meta/llama-3.1-8b-instruct',
  messages: [{ role: 'user', content: 'Hello!' }],
});

// Embeddings
const embeddings = await openai.embeddings.create({
  model: '@cf/baai/bge-base-en-v1.5',
  input: 'Hello world',
});
端点:
  • /v1/chat/completions
    - 文本生成
  • /v1/embeddings
    - 文本嵌入向量

Vercel AI SDK Integration

Vercel AI SDK集成

bash
npm install workers-ai-provider ai
typescript
import { createWorkersAI } from 'workers-ai-provider';
import { generateText, streamText } from 'ai';

const workersai = createWorkersAI({ binding: env.AI });

// Generate text
const result = await generateText({
  model: workersai('@cf/meta/llama-3.1-8b-instruct'),
  prompt: 'Write a poem',
});

// Stream text
const stream = streamText({
  model: workersai('@cf/meta/llama-3.1-8b-instruct'),
  prompt: 'Tell me a story',
});

bash
npm install workers-ai-provider ai
typescript
import { createWorkersAI } from 'workers-ai-provider';
import { generateText, streamText } from 'ai';

const workersai = createWorkersAI({ binding: env.AI });

// Generate text
const result = await generateText({
  model: workersai('@cf/meta/llama-3.1-8b-instruct'),
  prompt: 'Write a poem',
});

// Stream text
const stream = streamText({
  model: workersai('@cf/meta/llama-3.1-8b-instruct'),
  prompt: 'Tell me a story',
});

Limits Summary

限制汇总

FeatureLimit
Concurrent requestsNo hard limit (rate limits apply)
Max input tokensVaries by model (typically 2K-128K)
Max output tokensVaries by model (typically 512-2048)
Streaming chunk size~1 KB
Image size (output)~5 MB
Request timeoutWorkers timeout applies (30s default, 5m max CPU)
Daily free neurons10,000
Rate limitsSee "Rate Limits & Pricing" section

功能限制
并发请求无硬限制(适用速率限制)
最大输入令牌数因模型而异(通常2K-128K)
最大输出令牌数因模型而异(通常512-2048)
流式传输块大小~1 KB
图像输出大小~5 MB
请求超时时间适用Workers超时(默认30秒CPU时间,最大5分钟)
每日免费神经元数10,000
速率限制参见「速率限制与定价」章节

References

参考链接