cloudflare-workers-ai

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Cloudflare Workers AI - Complete Reference

Cloudflare Workers AI 完整参考指南

Production-ready knowledge domain for building AI-powered applications with Cloudflare Workers AI.

Status: Production Ready ✅ Last Updated: 2025-10-21 Dependencies: cloudflare-worker-base (for Worker setup) Latest Versions: wrangler@4.43.0, @cloudflare/workers-types@4.20251014.0

用于基于Cloudflare Workers AI构建AI驱动应用的生产级知识库。

状态：生产就绪 ✅ 最后更新：2025-10-21 依赖项：cloudflare-worker-base（用于Worker设置） 最新版本：wrangler@4.43.0, @cloudflare/workers-types@4.20251014.0

Quick Start (5 minutes)

快速入门（5分钟）

1. Add AI Binding

1. 添加AI绑定

wrangler.jsonc:

jsonc

{
  "ai": {
    "binding": "AI"
  }
}

wrangler.jsonc:

jsonc

{
  "ai": {
    "binding": "AI"
  }
}

2. Run Your First Model

2. 运行你的第一个模型

typescript

export interface Env {
  AI: Ai;
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
      prompt: 'What is Cloudflare?',
    });

    return Response.json(response);
  },
};

typescript

export interface Env {
  AI: Ai;
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
      prompt: 'What is Cloudflare?',
    });

    return Response.json(response);
  },
};

3. Add Streaming (Recommended)

3. 添加流式传输（推荐）

typescript

const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
  messages: [{ role: 'user', content: 'Tell me a story' }],
  stream: true, // Always use streaming for text generation!
});

return new Response(stream, {
  headers: { 'content-type': 'text/event-stream' },
});

Why streaming?

Prevents buffering large responses in memory
Faster time-to-first-token
Better user experience for long-form content
Avoids Worker timeout issues

typescript

const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
  messages: [{ role: 'user', content: 'Tell me a story' }],
  stream: true, // Always use streaming for text generation!
});

return new Response(stream, {
  headers: { 'content-type': 'text/event-stream' },
});

为什么使用流式传输？

避免在内存中缓冲大响应
缩短首令牌生成时间
提升长文本内容的用户体验
避免Worker超时问题

Workers AI API Reference

Workers AI API参考

env.AI.run()

env.AI.run()

Run an AI model inference.

Signature:

typescript

async env.AI.run(
  model: string,
  inputs: ModelInputs,
  options?: { gateway?: { id: string; skipCache?: boolean } }
): Promise<ModelOutput | ReadableStream>

Parameters:

```
model
```
(string, required) - Model ID (e.g.,
```
@cf/meta/llama-3.1-8b-instruct
```
)
```
inputs
```
(object, required) - Model-specific inputs
```
options
```
(object, optional) - Additional options
- ```
gateway
```
  (object) - AI Gateway configuration
  - ```
  id
```
  (string) - Gateway ID
- ```
skipCache
```
    (boolean) - Skip AI Gateway cache

Returns:

Non-streaming:
```
Promise<ModelOutput>
```
- JSON response
Streaming:
```
ReadableStream
```
- Server-sent events stream

运行AI模型推理。

签名：

typescript

async env.AI.run(
  model: string,
  inputs: ModelInputs,
  options?: { gateway?: { id: string; skipCache?: boolean } }
): Promise<ModelOutput | ReadableStream>

参数：

```
model
```
（字符串，必填）- 模型ID（例如：
```
@cf/meta/llama-3.1-8b-instruct
```
）
```
inputs
```
（对象，必填）- 模型特定输入
```
options
```
（对象，可选）- 附加选项
- ```
gateway
```
  （对象）- AI Gateway配置
  - ```
  id
```
  （字符串）- 网关ID
- ```
skipCache
```
    （布尔值）- 跳过AI Gateway缓存

返回值：

非流式：
```
Promise<ModelOutput>
```
- JSON响应
流式：
```
ReadableStream
```
- 服务器发送事件流

Text Generation Models

文本生成模型

Input Format:

typescript

{
  messages?: Array<{ role: 'system' | 'user' | 'assistant'; content: string }>;
  prompt?: string; // Deprecated, use messages
  stream?: boolean; // Default: false
  max_tokens?: number; // Max tokens to generate
  temperature?: number; // 0.0-1.0, default varies by model
  top_p?: number; // 0.0-1.0
  top_k?: number;
}

Output Format (Non-Streaming):

typescript

{
  response: string; // Generated text
}

Example:

typescript

const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is TypeScript?' },
  ],
  stream: false,
});

console.log(response.response);

输入格式：

typescript

{
  messages?: Array<{ role: 'system' | 'user' | 'assistant'; content: string }>;
  prompt?: string; // Deprecated, use messages
  stream?: boolean; // Default: false
  max_tokens?: number; // Max tokens to generate
  temperature?: number; // 0.0-1.0, default varies by model
  top_p?: number; // 0.0-1.0
  top_k?: number;
}

输出格式（非流式）：

typescript

{
  response: string; // Generated text
}

示例：

typescript

const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is TypeScript?' },
  ],
  stream: false,
});

console.log(response.response);

Text Embeddings Models

文本嵌入向量模型

Input Format:

typescript

{
  text: string | string[]; // Single text or array of texts
}

Output Format:

typescript

{
  shape: number[]; // [batch_size, embedding_dimensions]
  data: number[][]; // Array of embedding vectors
}

Example:

typescript

const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
  text: ['Hello world', 'Cloudflare Workers'],
});

console.log(embeddings.shape); // [2, 768]
console.log(embeddings.data[0]); // [0.123, -0.456, ...]

输入格式：

typescript

{
  text: string | string[]; // Single text or array of texts
}

输出格式：

typescript

{
  shape: number[]; // [batch_size, embedding_dimensions]
  data: number[][]; // Array of embedding vectors
}

示例：

typescript

const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
  text: ['Hello world', 'Cloudflare Workers'],
});

console.log(embeddings.shape); // [2, 768]
console.log(embeddings.data[0]); // [0.123, -0.456, ...]

Image Generation Models

图像生成模型

Input Format:

typescript

{
  prompt: string; // Text description
  num_steps?: number; // Default: 20
  guidance?: number; // CFG scale, default: 7.5
  strength?: number; // For img2img, default: 1.0
  image?: number[][]; // For img2img (base64 or array)
}

Output Format:

Binary image data (PNG/JPEG)

Example:

typescript

const imageStream = await env.AI.run('@cf/black-forest-labs/flux-1-schnell', {
  prompt: 'A beautiful sunset over mountains',
});

return new Response(imageStream, {
  headers: { 'content-type': 'image/png' },
});

输入格式：

typescript

{
  prompt: string; // Text description
  num_steps?: number; // Default: 20
  guidance?: number; // CFG scale, default: 7.5
  strength?: number; // For img2img, default: 1.0
  image?: number[][]; // For img2img (base64 or array)
}

输出格式：

二进制图像数据（PNG/JPEG）

示例：

typescript

const imageStream = await env.AI.run('@cf/black-forest-labs/flux-1-schnell', {
  prompt: 'A beautiful sunset over mountains',
});

return new Response(imageStream, {
  headers: { 'content-type': 'image/png' },
});

Vision Models

视觉模型

Input Format:

typescript

{
  messages: Array<{
    role: 'user' | 'assistant';
    content: Array<{ type: 'text' | 'image_url'; text?: string; image_url?: { url: string } }>;
  }>;
}

Example:

typescript

const response = await env.AI.run('@cf/meta/llama-3.2-11b-vision-instruct', {
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'What is in this image?' },
        { type: 'image_url', image_url: { url: 'data:image/png;base64,iVBOR...' } },
      ],
    },
  ],
});

输入格式：

typescript

{
  messages: Array<{
    role: 'user' | 'assistant';
    content: Array<{ type: 'text' | 'image_url'; text?: string; image_url?: { url: string } }>;
  }>;
}

示例：

typescript

const response = await env.AI.run('@cf/meta/llama-3.2-11b-vision-instruct', {
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'What is in this image?' },
        { type: 'image_url', image_url: { url: 'data:image/png;base64,iVBOR...' } },
      ],
    },
  ],
});

Model Selection Guide

模型选择指南

Text Generation (LLMs)

文本生成（LLMs）

Model	Best For	Rate Limit	Size
`@cf/meta/llama-3.1-8b-instruct`	General purpose, fast	300/min	8B
`@cf/meta/llama-3.2-1b-instruct`	Ultra-fast, simple tasks	300/min	1B
`@cf/qwen/qwen1.5-14b-chat-awq`	High quality, complex reasoning	150/min	14B
`@cf/deepseek-ai/deepseek-r1-distill-qwen-32b`	Coding, technical content	300/min	32B
`@hf/thebloke/mistral-7b-instruct-v0.1-awq`	Fast, efficient	400/min	7B

模型	适用场景	速率限制	规模
`@cf/meta/llama-3.1-8b-instruct`	通用场景、速度快	300/分钟	8B
`@cf/meta/llama-3.2-1b-instruct`	超快速、简单任务	300/分钟	1B
`@cf/qwen/qwen1.5-14b-chat-awq`	高质量、复杂推理	150/分钟	14B
`@cf/deepseek-ai/deepseek-r1-distill-qwen-32b`	代码生成、技术内容	300/分钟	32B
`@hf/thebloke/mistral-7b-instruct-v0.1-awq`	快速、高效	400/分钟	7B

Text Embeddings

文本嵌入向量

Model	Dimensions	Best For	Rate Limit
`@cf/baai/bge-base-en-v1.5`	768	General purpose RAG	3000/min
`@cf/baai/bge-large-en-v1.5`	1024	High accuracy search	1500/min
`@cf/baai/bge-small-en-v1.5`	384	Fast, low storage	3000/min

模型	维度	适用场景	速率限制
`@cf/baai/bge-base-en-v1.5`	768	通用RAG场景	3000/分钟
`@cf/baai/bge-large-en-v1.5`	1024	高精度搜索	1500/分钟
`@cf/baai/bge-small-en-v1.5`	384	快速、低存储占用	3000/分钟

Image Generation

图像生成

Model	Best For	Rate Limit	Speed
`@cf/black-forest-labs/flux-1-schnell`	High quality, photorealistic	720/min	Fast
`@cf/stabilityai/stable-diffusion-xl-base-1.0`	General purpose	720/min	Medium
`@cf/lykon/dreamshaper-8-lcm`	Artistic, stylized	720/min	Fast

模型	适用场景	速率限制	速度
`@cf/black-forest-labs/flux-1-schnell`	高质量、照片级真实感	720/分钟	快
`@cf/stabilityai/stable-diffusion-xl-base-1.0`	通用场景	720/分钟	中等
`@cf/lykon/dreamshaper-8-lcm`	艺术风格、风格化创作	720/分钟	快

Vision Models

视觉模型

Model	Best For	Rate Limit
`@cf/meta/llama-3.2-11b-vision-instruct`	Image understanding	720/min
`@cf/unum/uform-gen2-qwen-500m`	Fast image captioning	720/min

模型	适用场景	速率限制
`@cf/meta/llama-3.2-11b-vision-instruct`	图像理解	720/分钟
`@cf/unum/uform-gen2-qwen-500m`	快速图像 captioning	720/分钟

Common Patterns

常见模式

Pattern 1: Chat Completion with History

模式1：带历史记录的聊天补全

typescript

app.post('/chat', async (c) => {
  const { messages } = await c.req.json<{
    messages: Array<{ role: string; content: string }>;
  }>();

  const response = await c.env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
    messages,
    stream: true,
  });

  return new Response(response, {
    headers: { 'content-type': 'text/event-stream' },
  });
});

typescript

app.post('/chat', async (c) => {
  const { messages } = await c.req.json<{
    messages: Array<{ role: string; content: string }>;
  }>();

  const response = await c.env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
    messages,
    stream: true,
  });

  return new Response(response, {
    headers: { 'content-type': 'text/event-stream' },
  });
});

Pattern 2: RAG (Retrieval Augmented Generation)

模式2：RAG（检索增强生成）

typescript

// Step 1: Generate embeddings
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
  text: [userQuery],
});

const vector = embeddings.data[0];

// Step 2: Search Vectorize
const matches = await env.VECTORIZE.query(vector, { topK: 3 });

// Step 3: Build context from matches
const context = matches.matches.map((m) => m.metadata.text).join('\n\n');

// Step 4: Generate response with context
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
  messages: [
    {
      role: 'system',
      content: `Answer using this context:\n${context}`,
    },
    { role: 'user', content: userQuery },
  ],
  stream: true,
});

return new Response(response, {
  headers: { 'content-type': 'text/event-stream' },
});

typescript

// Step 1: Generate embeddings
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
  text: [userQuery],
});

const vector = embeddings.data[0];

// Step 2: Search Vectorize
const matches = await env.VECTORIZE.query(vector, { topK: 3 });

// Step 3: Build context from matches
const context = matches.matches.map((m) => m.metadata.text).join('\n\n');

// Step 4: Generate response with context
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
  messages: [
    {
      role: 'system',
      content: `Answer using this context:\n${context}`,
    },
    { role: 'user', content: userQuery },
  ],
  stream: true,
});

return new Response(response, {
  headers: { 'content-type': 'text/event-stream' },
});

Pattern 3: Structured Output with Zod

模式3：使用Zod的结构化输出

typescript

import { z } from 'zod';

const RecipeSchema = z.object({
  name: z.string(),
  ingredients: z.array(z.string()),
  instructions: z.array(z.string()),
  prepTime: z.number(),
});

app.post('/recipe', async (c) => {
  const { dish } = await c.req.json<{ dish: string }>();

  const response = await c.env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
    messages: [
      {
        role: 'user',
        content: `Generate a recipe for ${dish}. Return ONLY valid JSON matching this schema: ${JSON.stringify(RecipeSchema.shape)}`,
      },
    ],
  });

  // Parse and validate
  const recipe = RecipeSchema.parse(JSON.parse(response.response));

  return c.json(recipe);
});

typescript

import { z } from 'zod';

const RecipeSchema = z.object({
  name: z.string(),
  ingredients: z.array(z.string()),
  instructions: z.array(z.string()),
  prepTime: z.number(),
});

app.post('/recipe', async (c) => {
  const { dish } = await c.req.json<{ dish: string }>();

  const response = await c.env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
    messages: [
      {
        role: 'user',
        content: `Generate a recipe for ${dish}. Return ONLY valid JSON matching this schema: ${JSON.stringify(RecipeSchema.shape)}`,
      },
    ],
  });

  // Parse and validate
  const recipe = RecipeSchema.parse(JSON.parse(response.response));

  return c.json(recipe);
});

Pattern 4: Image Generation + R2 Storage

模式4：图像生成 + R2存储

typescript

app.post('/generate-image', async (c) => {
  const { prompt } = await c.req.json<{ prompt: string }>();

  // Generate image
  const imageStream = await c.env.AI.run('@cf/black-forest-labs/flux-1-schnell', {
    prompt,
  });

  const imageBytes = await new Response(imageStream).bytes();

  // Store in R2
  const key = `images/${Date.now()}.png`;
  await c.env.BUCKET.put(key, imageBytes, {
    httpMetadata: { contentType: 'image/png' },
  });

  return c.json({
    success: true,
    url: `https://your-domain.com/${key}`,
  });
});

typescript

app.post('/generate-image', async (c) => {
  const { prompt } = await c.req.json<{ prompt: string }>();

  // Generate image
  const imageStream = await c.env.AI.run('@cf/black-forest-labs/flux-1-schnell', {
    prompt,
  });

  const imageBytes = await new Response(imageStream).bytes();

  // Store in R2
  const key = `images/${Date.now()}.png`;
  await c.env.BUCKET.put(key, imageBytes, {
    httpMetadata: { contentType: 'image/png' },
  });

  return c.json({
    success: true,
    url: `https://your-domain.com/${key}`,
  });
});

AI Gateway Integration

AI Gateway集成

AI Gateway provides caching, logging, and analytics for AI requests.

Setup:

typescript

const response = await env.AI.run(
  '@cf/meta/llama-3.1-8b-instruct',
  { prompt: 'Hello' },
  {
    gateway: {
      id: 'my-gateway', // Your gateway ID
      skipCache: false, // Use cache
    },
  }
);

Benefits:

✅ Cost Tracking - Monitor neurons usage per request
✅ Caching - Reduce duplicate inference costs
✅ Logging - Debug and analyze AI requests
✅ Rate Limiting - Additional layer of protection
✅ Analytics - Request patterns and performance

Access Gateway Logs:

typescript

const gateway = env.AI.gateway('my-gateway');
const logId = env.AI.aiGatewayLogId;

// Send feedback
await gateway.patchLog(logId, {
  feedback: { rating: 1, comment: 'Great response' },
});

AI Gateway为AI请求提供缓存、日志和分析功能。

设置：

typescript

const response = await env.AI.run(
  '@cf/meta/llama-3.1-8b-instruct',
  { prompt: 'Hello' },
  {
    gateway: {
      id: 'my-gateway', // Your gateway ID
      skipCache: false, // Use cache
    },
  }
);

优势：

✅ 成本追踪 - 监控每个请求的神经元使用量
✅ 缓存 - 减少重复推理成本
✅ 日志 - 调试和分析AI请求
✅ 速率限制 - 额外的防护层
✅ 分析 - 请求模式和性能统计

访问网关日志：

typescript

const gateway = env.AI.gateway('my-gateway');
const logId = env.AI.aiGatewayLogId;

// Send feedback
await gateway.patchLog(logId, {
  feedback: { rating: 1, comment: 'Great response' },
});

Rate Limits & Pricing

速率限制与定价

Rate Limits (per minute)

速率限制（每分钟）

Task Type	Default Limit	Notes
Text Generation	300/min	Some fast models: 400-1500/min
Text Embeddings	3000/min	BGE-large: 1500/min
Image Generation	720/min	All image models
Vision Models	720/min	Image understanding
Translation	720/min	M2M100, Opus MT
Classification	2000/min	Text classification
Speech Recognition	720/min	Whisper models

任务类型	默认限制	说明
文本生成	300/分钟	部分快速模型：400-1500/分钟
文本嵌入向量	3000/分钟	BGE-large：1500/分钟
图像生成	720/分钟	所有图像模型
视觉模型	720/分钟	图像理解
翻译	720/分钟	M2M100、Opus MT
分类	2000/分钟	文本分类
语音识别	720/分钟	Whisper模型

Pricing (Neurons-Based)

定价（基于神经元）

Free Tier:

10,000 neurons per day
Resets daily at 00:00 UTC

Paid Tier:

$0.011 per 1,000 neurons
10,000 neurons/day included
Unlimited usage above free allocation

Example Costs:

Model	Input (1M tokens)	Output (1M tokens)
Llama 3.2 1B	$0.027	$0.201
Llama 3.1 8B	$0.088	$0.606
BGE-base embeddings	$0.005	N/A
Flux image generation	~$0.011/image	N/A

免费层级：

每日10,000个神经元
每日00:00 UTC重置

付费层级：

每1,000个神经元0.011美元
包含每日10,000个神经元
免费额度之外无使用限制

成本示例：

模型	输入（100万令牌）	输出（100万令牌）
Llama 3.2 1B	$0.027	$0.201
Llama 3.1 8B	$0.088	$0.606
BGE-base嵌入向量	$0.005	N/A
Flux图像生成	~$0.011/张	N/A

Production Checklist

生产环境检查清单

Before Deploying

部署前

Error Handling

错误处理

typescript

async function runAIWithRetry(
  env: Env,
  model: string,
  inputs: any,
  maxRetries = 3
): Promise<any> {
  let lastError: Error;

  for (let i = 0; i < maxRetries; i++) {
    try {
      return await env.AI.run(model, inputs);
    } catch (error) {
      lastError = error as Error;
      const message = lastError.message.toLowerCase();

      // Rate limit - retry with backoff
      if (message.includes('429') || message.includes('rate limit')) {
        const delay = Math.pow(2, i) * 1000; // Exponential backoff
        await new Promise((resolve) => setTimeout(resolve, delay));
        continue;
      }

      // Other errors - throw immediately
      throw error;
    }
  }

  throw lastError!;
}

typescript

async function runAIWithRetry(
  env: Env,
  model: string,
  inputs: any,
  maxRetries = 3
): Promise<any> {
  let lastError: Error;

  for (let i = 0; i < maxRetries; i++) {
    try {
      return await env.AI.run(model, inputs);
    } catch (error) {
      lastError = error as Error;
      const message = lastError.message.toLowerCase();

      // Rate limit - retry with backoff
      if (message.includes('429') || message.includes('rate limit')) {
        const delay = Math.pow(2, i) * 1000; // Exponential backoff
        await new Promise((resolve) => setTimeout(resolve, delay));
        continue;
      }

      // Other errors - throw immediately
      throw error;
    }
  }

  throw lastError!;
}

Monitoring

监控

typescript

app.use('*', async (c, next) => {
  const start = Date.now();

  await next();

  // Log AI usage
  console.log({
    path: c.req.path,
    duration: Date.now() - start,
    logId: c.env.AI.aiGatewayLogId,
  });
});

typescript

app.use('*', async (c, next) => {
  const start = Date.now();

  await next();

  // Log AI usage
  console.log({
    path: c.req.path,
    duration: Date.now() - start,
    logId: c.env.AI.aiGatewayLogId,
  });
});

OpenAI Compatibility

OpenAI兼容性

Workers AI supports OpenAI-compatible endpoints.

Using OpenAI SDK:

typescript

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: env.CLOUDFLARE_API_KEY,
  baseURL: `https://api.cloudflare.com/client/v4/accounts/${env.CLOUDFLARE_ACCOUNT_ID}/ai/v1`,
});

// Chat completions
const completion = await openai.chat.completions.create({
  model: '@cf/meta/llama-3.1-8b-instruct',
  messages: [{ role: 'user', content: 'Hello!' }],
});

// Embeddings
const embeddings = await openai.embeddings.create({
  model: '@cf/baai/bge-base-en-v1.5',
  input: 'Hello world',
});

Endpoints:

```
/v1/chat/completions
```
- Text generation
```
/v1/embeddings
```
- Text embeddings

Workers AI支持OpenAI兼容的端点。

使用OpenAI SDK：

typescript

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: env.CLOUDFLARE_API_KEY,
  baseURL: `https://api.cloudflare.com/client/v4/accounts/${env.CLOUDFLARE_ACCOUNT_ID}/ai/v1`,
});

// Chat completions
const completion = await openai.chat.completions.create({
  model: '@cf/meta/llama-3.1-8b-instruct',
  messages: [{ role: 'user', content: 'Hello!' }],
});

// Embeddings
const embeddings = await openai.embeddings.create({
  model: '@cf/baai/bge-base-en-v1.5',
  input: 'Hello world',
});

端点：

```
/v1/chat/completions
```
- 文本生成
```
/v1/embeddings
```
- 文本嵌入向量

Vercel AI SDK Integration

Vercel AI SDK集成

bash

npm install workers-ai-provider ai

typescript

import { createWorkersAI } from 'workers-ai-provider';
import { generateText, streamText } from 'ai';

const workersai = createWorkersAI({ binding: env.AI });

// Generate text
const result = await generateText({
  model: workersai('@cf/meta/llama-3.1-8b-instruct'),
  prompt: 'Write a poem',
});

// Stream text
const stream = streamText({
  model: workersai('@cf/meta/llama-3.1-8b-instruct'),
  prompt: 'Tell me a story',
});

bash

npm install workers-ai-provider ai

typescript

import { createWorkersAI } from 'workers-ai-provider';
import { generateText, streamText } from 'ai';

const workersai = createWorkersAI({ binding: env.AI });

// Generate text
const result = await generateText({
  model: workersai('@cf/meta/llama-3.1-8b-instruct'),
  prompt: 'Write a poem',
});

// Stream text
const stream = streamText({
  model: workersai('@cf/meta/llama-3.1-8b-instruct'),
  prompt: 'Tell me a story',
});

Limits Summary

限制汇总

Feature	Limit
Concurrent requests	No hard limit (rate limits apply)
Max input tokens	Varies by model (typically 2K-128K)
Max output tokens	Varies by model (typically 512-2048)
Streaming chunk size	~1 KB
Image size (output)	~5 MB
Request timeout	Workers timeout applies (30s default, 5m max CPU)
Daily free neurons	10,000
Rate limits	See "Rate Limits & Pricing" section

功能	限制
并发请求	无硬限制（适用速率限制）
最大输入令牌数	因模型而异（通常2K-128K）
最大输出令牌数	因模型而异（通常512-2048）
流式传输块大小	~1 KB
图像输出大小	~5 MB
请求超时时间	适用Workers超时（默认30秒CPU时间，最大5分钟）
每日免费神经元数	10,000
速率限制	参见「速率限制与定价」章节

cloudflare-workers-ai

Original

Translation

Cloudflare Workers AI - Complete Reference

Cloudflare Workers AI 完整参考指南

Table of Contents

目录

Quick Start (5 minutes)

快速入门（5分钟）

1. Add AI Binding

1. 添加AI绑定

2. Run Your First Model

2. 运行你的第一个模型

3. Add Streaming (Recommended)

3. 添加流式传输（推荐）

Workers AI API Reference

Workers AI API参考

env.AI.run()

env.AI.run()

Text Generation Models

文本生成模型

Text Embeddings Models

文本嵌入向量模型

Image Generation Models

图像生成模型

Vision Models

视觉模型

Model Selection Guide

模型选择指南

Text Generation (LLMs)

文本生成（LLMs）

Text Embeddings

文本嵌入向量

Image Generation

图像生成

Vision Models

视觉模型

Common Patterns

常见模式

Pattern 1: Chat Completion with History

模式1：带历史记录的聊天补全

Pattern 2: RAG (Retrieval Augmented Generation)

模式2：RAG（检索增强生成）

Pattern 3: Structured Output with Zod

模式3：使用Zod的结构化输出

Pattern 4: Image Generation + R2 Storage

模式4：图像生成 + R2存储

AI Gateway Integration

AI Gateway集成

Rate Limits & Pricing

速率限制与定价

Rate Limits (per minute)

速率限制（每分钟）

Pricing (Neurons-Based)

定价（基于神经元）

Production Checklist

生产环境检查清单

Before Deploying

部署前

Error Handling

错误处理

Monitoring

监控

OpenAI Compatibility

OpenAI兼容性

Vercel AI SDK Integration

Vercel AI SDK集成

Limits Summary

限制汇总

References

参考链接

`env.AI.run()`

`env.AI.run()`