google-gemini-file-search

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Google Gemini File Search Setup

Google Gemini File Search 搭建指南

Overview

概述

Google Gemini File Search is a fully managed RAG system. Upload documents (100+ formats: PDF, Word, Excel, code) and query with natural language—automatic chunking, embeddings, semantic search, and citations.
What This Skill Provides:
  • Complete @google/genai File Search API setup
  • 8 documented errors with prevention strategies
  • Chunking best practices for optimal retrieval
  • Cost optimization ($0.15/1M tokens indexing, 3x storage multiplier)
  • Cloudflare Workers + Next.js integration templates
Google Gemini File Search是一个全托管的RAG系统。你可以上传100多种格式的文档(PDF、Word、Excel、代码文件等),并通过自然语言进行查询,系统会自动完成分块、嵌入、语义搜索和引用标注。
本技能提供的内容:
  • 完整的@google/genai File Search API搭建步骤
  • 8种已记录的错误及预防策略
  • 实现最优检索效果的分块最佳实践
  • 成本优化方案(索引费用为$0.15/1M tokens,存储量为原文件的3倍)
  • Cloudflare Workers + Next.js集成模板

Prerequisites

前置条件

1. Google AI API Key

1. Google AI API密钥

Free Tier Limits:
  • 1 GB storage (total across all file search stores)
  • 1,500 requests per day
  • 1 million tokens per minute
Paid Tier Pricing:
  • Indexing: $0.15 per 1M input tokens (one-time)
  • Storage: Free (Tier 1: 10 GB, Tier 2: 100 GB, Tier 3: 1 TB)
  • Query-time embeddings: Free (retrieved context counts as input tokens)
免费版限制:
  • 1 GB存储空间(所有文件搜索存储库的总容量)
  • 每日1500次请求
  • 每分钟100万tokens
付费版定价:
  • 索引:每100万输入tokens收费$0.15(一次性费用)
  • 存储:免费(Tier 1:10 GB,Tier 2:100 GB,Tier 3:1 TB)
  • 查询时嵌入:免费(检索到的上下文计入输入tokens)

2. Node.js Environment

2. Node.js环境

Minimum Version: Node.js 18+ (v20+ recommended)
bash
node --version  # Should be >=18.0.0
最低版本: Node.js 18+(推荐v20+)
bash
node --version  # 版本应>=18.0.0

3. Install @google/genai SDK

3. 安装@google/genai SDK

bash
npm install @google/genai
bash
npm install @google/genai

or

pnpm add @google/genai
pnpm add @google/genai

or

yarn add @google/genai

**Current Stable Version:** 1.30.0+ (verify with `npm view @google/genai version`)

**⚠️ Important:** File Search API requires **@google/genai v1.29.0 or later**. Earlier versions do not support File Search. The API was added in v1.29.0 (November 5, 2025).
yarn add @google/genai

**当前稳定版本:** 1.30.0+(可通过`npm view @google/genai version`验证)

**⚠️ 重要提示:** File Search API需要**@google/genai v1.29.0或更高版本**。早期版本不支持File Search。该API在v1.29.0版本(2025年11月5日)中新增。

4. TypeScript Configuration (Optional but Recommended)

4. TypeScript配置(可选但推荐)

json
{
  "compilerOptions": {
    "target": "ES2020",
    "module": "ESNext",
    "moduleResolution": "node",
    "esModuleInterop": true,
    "strict": true,
    "skipLibCheck": true
  }
}
json
{
  "compilerOptions": {
    "target": "ES2020",
    "module": "ESNext",
    "moduleResolution": "node",
    "esModuleInterop": true,
    "strict": true,
    "skipLibCheck": true
  }
}

Common Errors Prevented

可预防的常见错误

This skill prevents 12 common errors encountered when implementing File Search:
本技能可帮助你避免在实现File Search时遇到的12种常见错误:

Error 1: Document Immutability

错误1:文档不可修改

Symptom:
Error: Documents cannot be modified after indexing
Cause: Documents are immutable once indexed. There is no PATCH or UPDATE operation.
Prevention: Use the delete+re-upload pattern for updates:
typescript
// ❌ WRONG: Trying to update document (no such API)
await ai.fileSearchStores.documents.update({
  name: documentName,
  customMetadata: { version: '2.0' }
})

// ✅ CORRECT: Delete then re-upload
const docs = await ai.fileSearchStores.documents.list({
  parent: fileStore.name
})

const oldDoc = docs.documents.find(d => d.displayName === 'manual.pdf')
if (oldDoc) {
  await ai.fileSearchStores.documents.delete({
    name: oldDoc.name,
    force: true
  })
}

await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('manual-v2.pdf'),
  config: { displayName: 'manual.pdf' }
})
症状:
Error: Documents cannot be modified after indexing
原因: 文档一旦完成索引就无法修改,不存在PATCH或UPDATE操作。
预防方案: 使用“删除+重新上传”的模式来更新文档:
typescript
// ❌ 错误做法:尝试更新文档(无此API)
await ai.fileSearchStores.documents.update({
  name: documentName,
  customMetadata: { version: '2.0' }
})

// ✅ 正确做法:先删除再重新上传
const docs = await ai.fileSearchStores.documents.list({
  parent: fileStore.name
})

const oldDoc = docs.documents.find(d => d.displayName === 'manual.pdf')
if (oldDoc) {
  await ai.fileSearchStores.documents.delete({
    name: oldDoc.name,
    force: true
  })
}

await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('manual-v2.pdf'),
  config: { displayName: 'manual.pdf' }
})

Error 2: Storage Quota Exceeded

错误2:存储配额超限

Symptom:
Error: Quota exceeded. Expected 1GB limit, but 3.2GB used.
Cause: Storage calculation includes input files + embeddings + metadata. Total storage ≈ 3x input size.
Prevention: Calculate storage before upload:
typescript
// ❌ WRONG: Assuming storage = file size
const fileSize = fs.statSync('data.pdf').size // 500 MB
// Expect 500 MB usage → WRONG

// ✅ CORRECT: Account for 3x multiplier
const fileSize = fs.statSync('data.pdf').size // 500 MB
const estimatedStorage = fileSize * 3 // 1.5 GB (embeddings + metadata)
console.log(`Estimated storage: ${estimatedStorage / 1e9} GB`)

// Check if within quota before upload
if (estimatedStorage > 1e9) {
  console.warn('⚠️ File may exceed free tier 1 GB limit')
}
症状:
Error: Quota exceeded. Expected 1GB limit, but 3.2GB used.
原因: 存储量计算包含原始文件、嵌入数据和元数据,总存储量约为原始文件大小的3倍。
预防方案: 在上传前计算预估存储量:
typescript
// ❌ 错误做法:假设存储量等于文件大小
const fileSize = fs.statSync('data.pdf').size // 500 MB
// 预期占用500 MB → 错误

// ✅ 正确做法:考虑3倍的存储乘数
const fileSize = fs.statSync('data.pdf').size // 500 MB
const estimatedStorage = fileSize * 3 // 1.5 GB(包含嵌入数据和元数据)
console.log(`预估存储量:${estimatedStorage / 1e9} GB`)

// 上传前检查是否在配额范围内
if (estimatedStorage > 1e9) {
  console.warn('⚠️ 文件可能超出免费版1 GB的存储限制')
}

Error 3: Incorrect Chunking Configuration

错误3:分块配置不正确

Symptom: Poor retrieval quality, irrelevant results, or context cutoff mid-sentence.
Cause: Default chunking may not be optimal for your content type.
Prevention: Use recommended chunking strategy:
typescript
// ❌ WRONG: Using defaults without testing
await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('docs.pdf')
  // Default chunking may be too large or too small
})

// ✅ CORRECT: Configure chunking for precision
await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('docs.pdf'),
  config: {
    chunkingConfig: {
      whiteSpaceConfig: {
        maxTokensPerChunk: 500,  // Smaller chunks = more precise retrieval
        maxOverlapTokens: 50     // 10% overlap prevents context loss
      }
    }
  }
})
Chunking Guidelines:
  • Technical docs/code: 500 tokens/chunk, 50 overlap
  • Prose/articles: 800 tokens/chunk, 80 overlap
  • Legal/contracts: 300 tokens/chunk, 30 overlap (high precision)
症状: 检索质量差、返回无关结果,或上下文在句子中途被截断。
原因: 默认分块配置可能并不适用于你的内容类型。
预防方案: 使用推荐的分块策略:
typescript
// ❌ 错误做法:直接使用默认配置而不测试
await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('docs.pdf')
  // 默认分块可能过大或过小
})

// ✅ 正确做法:配置分块以提高检索精度
await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('docs.pdf'),
  config: {
    chunkingConfig: {
      whiteSpaceConfig: {
        maxTokensPerChunk: 500,  // 较小的分块 = 更精准的检索
        maxOverlapTokens: 50     // 10%的重叠可避免上下文丢失
      }
    }
  }
})
分块指南:
  • 技术文档/代码: 500 tokens/分块,50个重叠tokens
  • 散文/文章: 800 tokens/分块,80个重叠tokens
  • 法律/合同文件: 300 tokens/分块,30个重叠tokens(高精度需求)

Error 4: Metadata Limits Exceeded

错误4:元数据超出限制

Symptom:
Error: Maximum 20 custom metadata key-value pairs allowed
Cause: Each document can have at most 20 metadata fields.
Prevention: Design compact metadata schema:
typescript
// ❌ WRONG: Too many metadata fields
await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('doc.pdf'),
  config: {
    customMetadata: {
      doc_type: 'manual',
      version: '1.0',
      author: 'John Doe',
      department: 'Engineering',
      created_date: '2025-01-01',
      // ... 18 more fields → Error!
    }
  }
})

// ✅ CORRECT: Use hierarchical keys or JSON strings
await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('doc.pdf'),
  config: {
    customMetadata: {
      doc_type: 'manual',
      version: '1.0',
      author_dept: 'John Doe|Engineering',  // Combine related fields
      dates: JSON.stringify({                // Or use JSON for complex data
        created: '2025-01-01',
        updated: '2025-01-15'
      })
    }
  }
})
症状:
Error: Maximum 20 custom metadata key-value pairs allowed
原因: 每个文档最多只能包含20个元数据字段。
预防方案: 设计紧凑的元数据结构:
typescript
// ❌ 错误做法:元数据字段过多
await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('doc.pdf'),
  config: {
    customMetadata: {
      doc_type: 'manual',
      version: '1.0',
      author: 'John Doe',
      department: 'Engineering',
      created_date: '2025-01-01',
      // ... 还有18个字段 → 报错!
    }
  }
})

// ✅ 正确做法:使用分层键或JSON字符串
await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('doc.pdf'),
  config: {
    customMetadata: {
      doc_type: 'manual',
      version: '1.0',
      author_dept: 'John Doe|Engineering',  // 合并相关字段
      dates: JSON.stringify({                // 或使用JSON存储复杂数据
        created: '2025-01-01',
        updated: '2025-01-15'
      })
    }
  }
})

Error 5: Indexing Cost Surprises

错误5:索引成本超出预期

Symptom: Unexpected bill for $375 after uploading 10 GB of documents.
Cause: Indexing costs are one-time but calculated per input token ($0.15/1M tokens).
Prevention: Estimate costs before indexing:
typescript
// ❌ WRONG: No cost estimation
await uploadAllDocuments(fileStore.name, './data') // 10 GB uploaded → $375 surprise

// ✅ CORRECT: Calculate costs upfront
const totalSize = getTotalDirectorySize('./data') // 10 GB
const estimatedTokens = (totalSize / 4) // Rough estimate: 1 token ≈ 4 bytes
const indexingCost = (estimatedTokens / 1e6) * 0.15

console.log(`Estimated indexing cost: $${indexingCost.toFixed(2)}`)
console.log(`Estimated storage: ${(totalSize * 3) / 1e9} GB`)

// Confirm before proceeding
const proceed = await confirm(`Proceed with indexing? Cost: $${indexingCost.toFixed(2)}`)
if (proceed) {
  await uploadAllDocuments(fileStore.name, './data')
}
Cost Examples:
  • 1 GB text ≈ 250M tokens = $37.50 indexing
  • 100 MB PDF ≈ 25M tokens = $3.75 indexing
  • 10 MB code ≈ 2.5M tokens = $0.38 indexing
症状: 上传10 GB文档后收到$375的意外账单。
原因: 索引费用为一次性收费,按输入tokens计算($0.15/1M tokens)。
预防方案: 在索引前预估成本:
typescript
// ❌ 错误做法:不进行成本预估
await uploadAllDocuments(fileStore.name, './data') // 上传10 GB文档 → 收到$375的意外账单

// ✅ 正确做法:提前计算成本
const totalSize = getTotalDirectorySize('./data') // 10 GB
const estimatedTokens = (totalSize / 4) // 粗略估算:1 token ≈ 4字节
const indexingCost = (estimatedTokens / 1e6) * 0.15

console.log(`预估索引成本:$${indexingCost.toFixed(2)}`)
console.log(`预估存储量:${(totalSize * 3) / 1e9} GB`)

// 确认后再继续
const proceed = await confirm(`是否继续索引?成本:$${indexingCost.toFixed(2)}`)
if (proceed) {
  await uploadAllDocuments(fileStore.name, './data')
}
成本示例:
  • 1 GB文本 ≈ 2.5亿tokens = $37.50索引费用
  • 100 MB PDF ≈ 2500万tokens = $3.75索引费用
  • 10 MB代码 ≈ 250万tokens = $0.38索引费用

Error 6: Not Polling Operation Status

错误6:未轮询操作状态

Symptom: Query returns no results immediately after upload, or incomplete indexing.
Cause: File uploads are processed asynchronously. Must poll operation until
done: true
.
Prevention: Always poll operation status with timeout and fallback:
typescript
// ❌ WRONG: Assuming upload is instant
const operation = await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('large.pdf')
})
// Immediately query → No results!

// ✅ CORRECT: Poll until indexing complete with timeout
const operation = await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('large.pdf')
})

// Poll with timeout and fallback
const MAX_POLL_TIME = 60000 // 60 seconds
const POLL_INTERVAL = 1000
let elapsed = 0

while (!operation.done && elapsed < MAX_POLL_TIME) {
  await new Promise(resolve => setTimeout(resolve, POLL_INTERVAL))
  elapsed += POLL_INTERVAL

  try {
    operation = await ai.operations.get({ name: operation.name })
    console.log(`Indexing progress: ${operation.metadata?.progress || 'processing...'}`)
  } catch (error) {
    console.warn('Polling failed, assuming complete:', error)
    break
  }
}

if (operation.error) {
  throw new Error(`Indexing failed: ${operation.error.message}`)
}

// ⚠️ Warning: operations.get() can be unreliable for large files
// If timeout reached, verify document exists manually
if (elapsed >= MAX_POLL_TIME) {
  console.warn('Polling timeout - verifying document manually')
  const docs = await ai.fileSearchStores.documents.list({ parent: fileStore.name })
  const uploaded = docs.documents?.find(d => d.displayName === 'large.pdf')
  if (uploaded) {
    console.log('✅ Document found despite polling timeout')
  } else {
    throw new Error('Upload failed - document not found')
  }
}

console.log('✅ Indexing complete:', operation.response?.displayName)
症状: 上传文档后立即查询无结果,或索引不完整。
原因: 文件上传是异步处理的,必须轮询操作状态直到
done: true
预防方案: 始终带超时和回退机制轮询操作状态:
typescript
// ❌ 错误做法:假设上传立即完成
const operation = await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('large.pdf')
})
// 立即查询 → 无结果!

// ✅ 正确做法:轮询直到索引完成,并设置超时
const operation = await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('large.pdf')
})

// 带超时和回退的轮询
const MAX_POLL_TIME = 60000 // 60秒
const POLL_INTERVAL = 1000
let elapsed = 0

while (!operation.done && elapsed < MAX_POLL_TIME) {
  await new Promise(resolve => setTimeout(resolve, POLL_INTERVAL))
  elapsed += POLL_INTERVAL

  try {
    operation = await ai.operations.get({ name: operation.name })
    console.log(`索引进度:${operation.metadata?.progress || '处理中...'}`)
  } catch (error) {
    console.warn('轮询失败,假设索引已完成:', error)
    break
  }
}

if (operation.error) {
  throw new Error(`索引失败:${operation.error.message}`)
}

// ⚠️ 注意:对于大文件,operations.get()可能不可靠
// 如果超时,手动验证文档是否存在
if (elapsed >= MAX_POLL_TIME) {
  console.warn('轮询超时 - 手动验证文档')
  const docs = await ai.fileSearchStores.documents.list({ parent: fileStore.name })
  const uploaded = docs.documents?.find(d => d.displayName === 'large.pdf')
  if (uploaded) {
    console.log('✅ 尽管轮询超时,但文档已找到')
  } else {
    throw new Error('上传失败 - 未找到文档')
  }
}

console.log('✅ 索引完成:', operation.response?.displayName)

Error 7: Forgetting Force Delete

错误7:忘记使用强制删除

Symptom:
Error: Cannot delete store with documents. Set force=true.
Cause: Stores with documents require
force: true
to delete (prevents accidental deletion).
Prevention: Always use
force: true
when deleting non-empty stores:
typescript
// ❌ WRONG: Trying to delete store with documents
await ai.fileSearchStores.delete({
  name: fileStore.name
})
// Error: Cannot delete store with documents

// ✅ CORRECT: Use force delete
await ai.fileSearchStores.delete({
  name: fileStore.name,
  force: true  // Deletes store AND all documents
})

// Alternative: Delete documents first
const docs = await ai.fileSearchStores.documents.list({ parent: fileStore.name })
for (const doc of docs.documents || []) {
  await ai.fileSearchStores.documents.delete({
    name: doc.name,
    force: true
  })
}
await ai.fileSearchStores.delete({ name: fileStore.name })
症状:
Error: Cannot delete store with documents. Set force=true.
原因: 包含文档的存储库需要设置
force: true
才能删除(防止误删)。
预防方案: 删除非空存储库时始终使用
force: true
typescript
// ❌ 错误做法:尝试删除包含文档的存储库
await ai.fileSearchStores.delete({
  name: fileStore.name
})
// 报错:Cannot delete store with documents

// ✅ 正确做法:使用强制删除
await ai.fileSearchStores.delete({
  name: fileStore.name,
  force: true  // 删除存储库及所有文档
})

// 替代方案:先删除所有文档
const docs = await ai.fileSearchStores.documents.list({ parent: fileStore.name })
for (const doc of docs.documents || []) {
  await ai.fileSearchStores.documents.delete({
    name: doc.name,
    force: true
  })
}
await ai.fileSearchStores.delete({ name: fileStore.name })

Error 8: Using Unsupported Models

错误8:使用不支持的模型

Symptom:
Error: File Search is only supported for Gemini 3 Pro and Flash models
Cause: File Search requires Gemini 3 Pro or Gemini 3 Flash. Gemini 2.x and 1.5 models are not supported.
Prevention: Always use Gemini 3 models:
typescript
// ❌ WRONG: Using Gemini 1.5 model
const response = await ai.models.generateContent({
  model: 'gemini-1.5-pro',  // Not supported!
  contents: 'What is the installation procedure?',
  config: {
    tools: [{
      fileSearch: { fileSearchStoreNames: [fileStore.name] }
    }]
  }
})

// ✅ CORRECT: Use Gemini 3 models
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',  // ✅ Supported (fast, cost-effective)
  // OR
  // model: 'gemini-3-pro',   // ✅ Supported (higher quality)
  contents: 'What is the installation procedure?',
  config: {
    tools: [{
      fileSearch: { fileSearchStoreNames: [fileStore.name] }
    }]
  }
})
症状:
Error: File Search is only supported for Gemini 3 Pro and Flash models
原因: File Search仅支持Gemini 3 Pro和Gemini 3 Flash模型,不支持Gemini 2.x和1.5模型。
预防方案: 始终使用Gemini 3系列模型:
typescript
// ❌ 错误做法:使用Gemini 1.5模型
const response = await ai.models.generateContent({
  model: 'gemini-1.5-pro',  // 不支持!
  contents: 'What is the installation procedure?',
  config: {
    tools: [{
      fileSearch: { fileSearchStoreNames: [fileStore.name] }
    }]
  }
})

// ✅ 正确做法:使用Gemini 3模型
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',  // ✅ 支持(速度快、成本低)
  // 或
  // model: 'gemini-3-pro',   // ✅ 支持(质量更高)
  contents: 'What is the installation procedure?',
  config: {
    tools: [{
      fileSearch: { fileSearchStoreNames: [fileStore.name] }
    }]
  }
})

Error 9: displayName Not Preserved for Blob Sources (Fixed v1.34.0+)

错误9:Blob源文件的displayName未被保留(v1.34.0+已修复)

Symptom:
groundingChunks[0].title === null  // No document source shown
Cause: In @google/genai versions prior to v1.34.0, when uploading files as
Blob
objects (not file paths), the SDK dropped the
displayName
and
customMetadata
configuration fields.
Prevention:
typescript
// ✅ CORRECT: Upgrade to v1.34.0+ for automatic fix
npm install @google/genai@latest  // v1.34.0+

await ai.fileSearchStores.uploadToFileSearchStore({
  name: storeName,
  file: new Blob([arrayBuffer], { type: 'application/pdf' }),
  config: {
    displayName: 'Safety Manual.pdf',  // ✅ Now preserved
    customMetadata: { version: '1.0' }  // ✅ Now preserved
  }
})

// ⚠️ WORKAROUND for v1.33.0 and earlier: Use resumable upload
const uploadUrl = `https://generativelanguage.googleapis.com/upload/v1beta/${storeName}:uploadToFileSearchStore?key=${API_KEY}`

// Step 1: Initiate with displayName in body
const initResponse = await fetch(uploadUrl, {
  method: 'POST',
  headers: {
    'X-Goog-Upload-Protocol': 'resumable',
    'X-Goog-Upload-Command': 'start',
    'X-Goog-Upload-Header-Content-Length': numBytes.toString(),
    'X-Goog-Upload-Header-Content-Type': 'application/pdf',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    displayName: 'Safety Manual.pdf'  // ✅ Works with resumable upload
  })
})

// Step 2: Upload file bytes
const uploadUrl2 = initResponse.headers.get('X-Goog-Upload-URL')
await fetch(uploadUrl2, {
  method: 'PUT',
  headers: {
    'Content-Length': numBytes.toString(),
    'X-Goog-Upload-Offset': '0',
    'X-Goog-Upload-Command': 'upload, finalize',
    'Content-Type': 'application/pdf'
  },
  body: fileBytes
})
症状:
groundingChunks[0].title === null  // 未显示文档来源
原因: 在@google/genai v1.34.0之前的版本中,当以
Blob
对象(而非文件路径)上传文件时,SDK会丢弃
displayName
customMetadata
配置字段。
预防方案:
typescript
// ✅ 正确做法:升级到v1.34.0+以自动修复
npm install @google/genai@latest  // v1.34.0+

await ai.fileSearchStores.uploadToFileSearchStore({
  name: storeName,
  file: new Blob([arrayBuffer], { type: 'application/pdf' }),
  config: {
    displayName: 'Safety Manual.pdf',  // ✅ 现在会被保留
    customMetadata: { version: '1.0' }  // ✅ 现在会被保留
  }
})

// ⚠️ v1.33.0及更早版本的解决方法:使用可恢复上传
const uploadUrl = `https://generativelanguage.googleapis.com/upload/v1beta/${storeName}:uploadToFileSearchStore?key=${API_KEY}`

// 步骤1:在请求体中传入displayName以初始化
const initResponse = await fetch(uploadUrl, {
  method: 'POST',
  headers: {
    'X-Goog-Upload-Protocol': 'resumable',
    'X-Goog-Upload-Command': 'start',
    'X-Goog-Upload-Header-Content-Length': numBytes.toString(),
    'X-Goog-Upload-Header-Content-Type': 'application/pdf',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    displayName: 'Safety Manual.pdf'  // ✅ 可恢复上传支持此配置
  })
})

// 步骤2:上传文件字节
const uploadUrl2 = initResponse.headers.get('X-Goog-Upload-URL')
await fetch(uploadUrl2, {
  method: 'PUT',
  headers: {
    'Content-Length': numBytes.toString(),
    'X-Goog-Upload-Offset': '0',
    'X-Goog-Upload-Command': 'upload, finalize',
    'Content-Type': 'application/pdf'
  },
  body: fileBytes
})

Error 10: Grounding Metadata Ignored with JSON Response Mode

错误10:JSON响应模式下Grounding元数据被忽略

Symptom:
response.candidates[0].groundingMetadata === undefined
// Even though fileSearch tool is configured
Cause: When using
responseMimeType: 'application/json'
for structured output, the API ignores the
fileSearch
tool and returns no grounding metadata, even with Gemini 3 models.
Prevention:
typescript
// ❌ WRONG: Structured output overrides grounding
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Summarize guidelines',
  config: {
    responseMimeType: 'application/json',  // Loses grounding
    tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
  }
})

// ✅ CORRECT: Two-step approach
// Step 1: Get grounded text response
const textResponse = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Summarize guidelines',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
  }
})

const grounding = textResponse.candidates[0].groundingMetadata

// Step 2: Convert to structured format in prompt
const jsonResponse = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: `Convert to JSON: ${textResponse.text}

Format:
{
  "summary": "...",
  "key_points": ["..."]
}`,
  config: {
    responseMimeType: 'application/json',
    responseSchema: {
      type: 'object',
      properties: {
        summary: { type: 'string' },
        key_points: { type: 'array', items: { type: 'string' } }
      }
    }
  }
})

// Combine results
const result = {
  data: JSON.parse(jsonResponse.text),
  sources: grounding.groundingChunks
}
症状:
response.candidates[0].groundingMetadata === undefined
// 即使已配置fileSearch工具
原因: 当使用
responseMimeType: 'application/json'
获取结构化输出时,API会忽略
fileSearch
工具,且不会返回grounding元数据,即使使用Gemini 3模型也是如此。
预防方案:
typescript
// ❌ 错误做法:结构化输出会覆盖grounding信息
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Summarize guidelines',
  config: {
    responseMimeType: 'application/json',  // 丢失grounding信息
    tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
  }
})

// ✅ 正确做法:分两步实现
// 步骤1:获取带grounding的文本响应
const textResponse = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Summarize guidelines',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
  }
})

const grounding = textResponse.candidates[0].groundingMetadata

// 步骤2:在提示词中要求转换为结构化格式
const jsonResponse = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: `Convert to JSON: ${textResponse.text}

Format:
{
  "summary": "...",
  "key_points": ["..."]
}`,
  config: {
    responseMimeType: 'application/json',
    responseSchema: {
      type: 'object',
      properties: {
        summary: { type: 'string' },
        key_points: { type: 'array', items: { type: 'string' } }
      }
    }
  }
})

// 合并结果
const result = {
  data: JSON.parse(jsonResponse.text),
  sources: grounding.groundingChunks
}

Error 11: Google Search and File Search Tools Are Mutually Exclusive

错误11:Google Search与File Search工具互斥

Symptom:
Error: "Search as a tool and file search tool are not supported together"
Status: INVALID_ARGUMENT
Cause: The Gemini API does not allow using
googleSearch
and
fileSearch
tools in the same request.
Prevention:
typescript
// ❌ WRONG: Combining search tools
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'What are the latest industry guidelines?',
  config: {
    tools: [
      { googleSearch: {} },
      { fileSearch: { fileSearchStoreNames: [storeName] } }
    ]
  }
})

// ✅ CORRECT: Use separate specialist agents
async function searchWeb(query: string) {
  return ai.models.generateContent({
    model: 'gemini-3-flash',
    contents: query,
    config: { tools: [{ googleSearch: {} }] }
  })
}

async function searchDocuments(query: string) {
  return ai.models.generateContent({
    model: 'gemini-3-flash',
    contents: query,
    config: { tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }] }
  })
}

// Orchestrate based on query type
const needsWeb = query.includes('latest') || query.includes('current')
const response = needsWeb
  ? await searchWeb(query)
  : await searchDocuments(query)
症状:
Error: "Search as a tool and file search tool are not supported together"
Status: INVALID_ARGUMENT
原因: Gemini API不允许在同一请求中同时使用
googleSearch
fileSearch
工具。
预防方案:
typescript
// ❌ 错误做法:同时使用两种搜索工具
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'What are the latest industry guidelines?',
  config: {
    tools: [
      { googleSearch: {} },
      { fileSearch: { fileSearchStoreNames: [storeName] } }
    ]
  }
})

// ✅ 正确做法:使用独立的专用代理
async function searchWeb(query: string) {
  return ai.models.generateContent({
    model: 'gemini-3-flash',
    contents: query,
    config: { tools: [{ googleSearch: {} }] }
  })
}

async function searchDocuments(query: string) {
  return ai.models.generateContent({
    model: 'gemini-3-flash',
    contents: query,
    config: { tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }] }
  })
}

// 根据查询类型选择对应的工具
const needsWeb = query.includes('latest') || query.includes('current')
const response = needsWeb
  ? await searchWeb(query)
  : await searchDocuments(query)

Error 12: Batch API Missing Response Metadata (Community-sourced)

错误12:批量API缺少响应元数据(社区反馈)

Symptom: Cannot correlate batch responses with requests when using metadata field.
Cause: When using Batch API with
InlinedRequest
that includes a
metadata
field, the corresponding
InlinedResponse
does not return the metadata.
Prevention:
typescript
// ❌ WRONG: Expecting metadata in response
const batchRequest = {
  metadata: { key: 'my-request-id' },
  contents: [{ parts: [{ text: 'Question?' }], role: 'user' }],
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
  }
}

const batchResponse = await ai.batch.create({ requests: [batchRequest] })
console.log(batchResponse.responses[0].metadata)  // ❌ undefined

// ✅ CORRECT: Use array index to correlate
const requests = [
  { metadata: { id: 'req-1' }, contents: [...] },
  { metadata: { id: 'req-2' }, contents: [...] }
]

const responses = await ai.batch.create({ requests })

// Map by index (not ideal but works)
responses.responses.forEach((response, i) => {
  const requestMetadata = requests[i].metadata
  console.log(`Response for ${requestMetadata.id}:`, response)
})
Community Verification: Maintainer confirmed, internal bug filed.
症状: 使用批量API时,无法通过metadata字段将响应与请求关联起来。
原因: 当使用包含
metadata
字段的
InlinedRequest
调用批量API时,对应的
InlinedResponse
不会返回该metadata。
预防方案:
typescript
// ❌ 错误做法:期望在响应中获取metadata
const batchRequest = {
  metadata: { key: 'my-request-id' },
  contents: [{ parts: [{ text: 'Question?' }], role: 'user' }],
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
  }
}

const batchResponse = await ai.batch.create({ requests: [batchRequest] })
console.log(batchResponse.responses[0].metadata)  // ❌ undefined

// ✅ 正确做法:使用数组索引进行关联
const requests = [
  { metadata: { id: 'req-1' }, contents: [...] },
  { metadata: { id: 'req-2' }, contents: [...] }
]

const responses = await ai.batch.create({ requests })

// 通过索引映射(虽不理想但可行)
responses.responses.forEach((response, i) => {
  const requestMetadata = requests[i].metadata
  console.log(`请求${requestMetadata.id}的响应:`, response)
})
社区验证: 维护者已确认此问题,内部已提交bug工单。

Setup Instructions

搭建步骤

Step 1: Initialize Client

步骤1:初始化客户端

typescript
import { GoogleGenAI } from '@google/genai'
import fs from 'fs'

// Initialize client with API key
const ai = new GoogleGenAI({
  apiKey: process.env.GOOGLE_API_KEY
})

// Verify API key is set
if (!process.env.GOOGLE_API_KEY) {
  throw new Error('GOOGLE_API_KEY environment variable is required')
}
typescript
import { GoogleGenAI } from '@google/genai'
import fs from 'fs'

// 使用API密钥初始化客户端
const ai = new GoogleGenAI({
  apiKey: process.env.GOOGLE_API_KEY
})

// 验证API密钥已设置
if (!process.env.GOOGLE_API_KEY) {
  throw new Error('需要设置GOOGLE_API_KEY环境变量')
}

Step 2: Create File Search Store

步骤2:创建文件搜索存储库

typescript
// Create a store (container for documents)
const fileStore = await ai.fileSearchStores.create({
  config: {
    displayName: 'my-knowledge-base',  // Human-readable name
    // Optional: Add store-level metadata
    customMetadata: {
      project: 'customer-support',
      environment: 'production'
    }
  }
})

console.log('Created store:', fileStore.name)
// Output: fileSearchStores/abc123xyz...
Finding Existing Stores:
typescript
// List all stores (paginated)
const stores = await ai.fileSearchStores.list({
  pageSize: 20  // Max 20 per page
})

// Find by display name
let targetStore = null
let pageToken = null

do {
  const page = await ai.fileSearchStores.list({ pageToken })
  targetStore = page.fileSearchStores.find(
    s => s.displayName === 'my-knowledge-base'
  )
  pageToken = page.nextPageToken
} while (!targetStore && pageToken)

if (targetStore) {
  console.log('Found existing store:', targetStore.name)
} else {
  console.log('Store not found, creating new one...')
}
typescript
// 创建存储库(文档的容器)
const fileStore = await ai.fileSearchStores.create({
  config: {
    displayName: 'my-knowledge-base',  // 人类可读的名称
    // 可选:添加存储库级别的元数据
    customMetadata: {
      project: 'customer-support',
      environment: 'production'
    }
  }
})

console.log('已创建存储库:', fileStore.name)
// 输出:fileSearchStores/abc123xyz...
查找已存在的存储库:
typescript
// 列出所有存储库(分页)
const stores = await ai.fileSearchStores.list({
  pageSize: 20  // 每页最多20个
})

// 通过显示名称查找
let targetStore = null
let pageToken = null

do {
  const page = await ai.fileSearchStores.list({ pageToken })
  targetStore = page.fileSearchStores.find(
    s => s.displayName === 'my-knowledge-base'
  )
  pageToken = page.nextPageToken
} while (!targetStore && pageToken)

if (targetStore) {
  console.log('找到已存在的存储库:', targetStore.name)
} else {
  console.log('未找到存储库,正在创建新的...')
}

Step 3: Upload Documents

步骤3:上传文档

Single File Upload:
typescript
const operation = await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('./docs/manual.pdf'),
  config: {
    displayName: 'Installation Manual',
    customMetadata: {
      doc_type: 'manual',
      version: '1.0',
      language: 'en'
    },
    chunkingConfig: {
      whiteSpaceConfig: {
        maxTokensPerChunk: 500,
        maxOverlapTokens: 50
      }
    }
  }
})

// Poll until indexing complete
while (!operation.done) {
  await new Promise(resolve => setTimeout(resolve, 1000))
  operation = await ai.operations.get({ name: operation.name })
}

console.log('✅ Indexed:', operation.response.displayName)
Batch Upload (Concurrent):
typescript
const filePaths = [
  './docs/manual.pdf',
  './docs/faq.md',
  './docs/troubleshooting.docx'
]

// Upload all files concurrently
const uploadPromises = filePaths.map(filePath =>
  ai.fileSearchStores.uploadToFileSearchStore({
    name: fileStore.name,
    file: fs.createReadStream(filePath),
    config: {
      displayName: filePath.split('/').pop(),
      customMetadata: {
        doc_type: 'support',
        source_path: filePath
      },
      chunkingConfig: {
        whiteSpaceConfig: {
          maxTokensPerChunk: 500,
          maxOverlapTokens: 50
        }
      }
    }
  })
)

const operations = await Promise.all(uploadPromises)

// Poll all operations
for (const operation of operations) {
  let op = operation
  while (!op.done) {
    await new Promise(resolve => setTimeout(resolve, 1000))
    op = await ai.operations.get({ name: op.name })
  }
  console.log('✅ Indexed:', op.response.displayName)
}
单文件上传:
typescript
const operation = await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('./docs/manual.pdf'),
  config: {
    displayName: 'Installation Manual',
    customMetadata: {
      doc_type: 'manual',
      version: '1.0',
      language: 'en'
    },
    chunkingConfig: {
      whiteSpaceConfig: {
        maxTokensPerChunk: 500,
        maxOverlapTokens: 50
      }
    }
  }
})

// 轮询直到索引完成
while (!operation.done) {
  await new Promise(resolve => setTimeout(resolve, 1000))
  operation = await ai.operations.get({ name: operation.name })
}

console.log('✅ 已索引:', operation.response.displayName)
批量上传(并发):
typescript
const filePaths = [
  './docs/manual.pdf',
  './docs/faq.md',
  './docs/troubleshooting.docx'
]

// 并发上传所有文件
const uploadPromises = filePaths.map(filePath =>
  ai.fileSearchStores.uploadToFileSearchStore({
    name: fileStore.name,
    file: fs.createReadStream(filePath),
    config: {
      displayName: filePath.split('/').pop(),
      customMetadata: {
        doc_type: 'support',
        source_path: filePath
      },
      chunkingConfig: {
        whiteSpaceConfig: {
          maxTokensPerChunk: 500,
          maxOverlapTokens: 50
        }
      }
    }
  })
)

const operations = await Promise.all(uploadPromises)

// 轮询所有操作状态
for (const operation of operations) {
  let op = operation
  while (!op.done) {
    await new Promise(resolve => setTimeout(resolve, 1000))
    op = await ai.operations.get({ name: op.name })
  }
  console.log('✅ 已索引:', op.response.displayName)
}

Step 4: Query with File Search

步骤4:使用File Search进行查询

Basic Query:
typescript
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'What are the safety precautions for installation?',
  config: {
    tools: [{
      fileSearch: {
        fileSearchStoreNames: [fileStore.name]
      }
    }]
  }
})

console.log('Answer:', response.text)

// Access citations
const grounding = response.candidates[0].groundingMetadata
if (grounding?.groundingChunks) {
  console.log('\nSources:')
  grounding.groundingChunks.forEach((chunk, i) => {
    console.log(`${i + 1}. ${chunk.retrievedContext?.title || 'Unknown'}`)
    console.log(`   URI: ${chunk.retrievedContext?.uri || 'N/A'}`)
  })
}
Query with Metadata Filtering:
typescript
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'How do I reset the device?',
  config: {
    tools: [{
      fileSearch: {
        fileSearchStoreNames: [fileStore.name],
        // Filter to only search troubleshooting docs in English, version 1.0
        metadataFilter: 'doc_type="troubleshooting" AND language="en" AND version="1.0"'
      }
    }]
  }
})

console.log('Answer:', response.text)
Metadata Filter Syntax:
  • AND:
    key1="value1" AND key2="value2"
  • OR:
    key1="value1" OR key1="value2"
  • Parentheses:
    (key1="a" OR key1="b") AND key2="c"
基础查询:
typescript
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'What are the safety precautions for installation?',
  config: {
    tools: [{
      fileSearch: {
        fileSearchStoreNames: [fileStore.name]
      }
    }]
  }
})

console.log('答案:', response.text)

// 访问引用信息
const grounding = response.candidates[0].groundingMetadata
if (grounding?.groundingChunks) {
  console.log('\n来源:')
  grounding.groundingChunks.forEach((chunk, i) => {
    console.log(`${i + 1}. ${chunk.retrievedContext?.title || 'Unknown'}`)
    console.log(`   URI: ${chunk.retrievedContext?.uri || 'N/A'}`)
  })
}
带元数据过滤的查询:
typescript
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'How do I reset the device?',
  config: {
    tools: [{
      fileSearch: {
        fileSearchStoreNames: [fileStore.name],
        // 仅搜索英文版本1.0的故障排除文档
        metadataFilter: 'doc_type="troubleshooting" AND language="en" AND version="1.0"'
      }
    }]
  }
})

console.log('答案:', response.text)
元数据过滤语法:
  • AND:
    key1="value1" AND key2="value2"
  • OR:
    key1="value1" OR key1="value2"
  • 括号:
    (key1="a" OR key1="b") AND key2="c"

Step 5: List and Manage Documents

步骤5:列出和管理文档

typescript
// List all documents in store
const docs = await ai.fileSearchStores.documents.list({
  parent: fileStore.name,
  pageSize: 20
})

console.log(`Total documents: ${docs.documents?.length || 0}`)

docs.documents?.forEach(doc => {
  console.log(`- ${doc.displayName} (${doc.name})`)
  console.log(`  Metadata:`, doc.customMetadata)
})

// Get specific document details
const docDetails = await ai.fileSearchStores.documents.get({
  name: docs.documents[0].name
})

console.log('Document details:', docDetails)

// Delete document
await ai.fileSearchStores.documents.delete({
  name: docs.documents[0].name,
  force: true
})
typescript
// 列出存储库中的所有文档
const docs = await ai.fileSearchStores.documents.list({
  parent: fileStore.name,
  pageSize: 20
})

console.log(`总文档数:${docs.documents?.length || 0}`)

docs.documents?.forEach(doc => {
  console.log(`- ${doc.displayName} (${doc.name})`)
  console.log(`  元数据:`, doc.customMetadata)
})

// 获取特定文档的详细信息
const docDetails = await ai.fileSearchStores.documents.get({
  name: docs.documents[0].name
})

console.log('文档详细信息:', docDetails)

// 删除文档
await ai.fileSearchStores.documents.delete({
  name: docs.documents[0].name,
  force: true
})

Step 6: Cleanup

步骤6:清理资源

typescript
// Delete entire store (force deletes all documents)
await ai.fileSearchStores.delete({
  name: fileStore.name,
  force: true
})

console.log('✅ Store deleted')
typescript
// 删除整个存储库(强制删除所有文档)
await ai.fileSearchStores.delete({
  name: fileStore.name,
  force: true
})

console.log('✅ 存储库已删除')

Recommended Chunking Strategies

推荐的分块策略

Chunking configuration significantly impacts retrieval quality. Adjust based on content type:
分块配置对检索质量影响显著,请根据内容类型进行调整:

Technical Documentation

技术文档

typescript
chunkingConfig: {
  whiteSpaceConfig: {
    maxTokensPerChunk: 500,   // Smaller chunks for precise code/API lookup
    maxOverlapTokens: 50      // 10% overlap
  }
}
Best for: API docs, SDK references, code examples, configuration guides
typescript
chunkingConfig: {
  whiteSpaceConfig: {
    maxTokensPerChunk: 500,   // 较小的分块适用于精准的代码/API查询
    maxOverlapTokens: 50      // 10%的重叠
  }
}
最佳适用场景: API文档、SDK参考、代码示例、配置指南

Prose and Articles

散文与文章

typescript
chunkingConfig: {
  whiteSpaceConfig: {
    maxTokensPerChunk: 800,   // Larger chunks preserve narrative flow
    maxOverlapTokens: 80      // 10% overlap
  }
}
Best for: Blog posts, news articles, product descriptions, marketing materials
typescript
chunkingConfig: {
  whiteSpaceConfig: {
    maxTokensPerChunk: 800,   // 较大的分块可保留叙述连贯性
    maxOverlapTokens: 80      // 10%的重叠
  }
}
最佳适用场景: 博客文章、新闻报道、产品描述、营销材料

Legal and Contracts

法律与合同文件

typescript
chunkingConfig: {
  whiteSpaceConfig: {
    maxTokensPerChunk: 300,   // Very small chunks for high precision
    maxOverlapTokens: 30      // 10% overlap
  }
}
Best for: Legal documents, contracts, regulations, compliance docs
typescript
chunkingConfig: {
  whiteSpaceConfig: {
    maxTokensPerChunk: 300,   // 非常小的分块适用于高精度需求
    maxOverlapTokens: 30      // 10%的重叠
  }
}
最佳适用场景: 法律文件、合同、法规、合规文档

FAQ and Support

FAQ与支持文档

typescript
chunkingConfig: {
  whiteSpaceConfig: {
    maxTokensPerChunk: 400,   // Medium chunks (1-2 Q&A pairs)
    maxOverlapTokens: 40      // 10% overlap
  }
}
Best for: FAQs, troubleshooting guides, how-to articles
General Rule: Maintain 10% overlap (overlap = chunk size / 10) to prevent context loss at chunk boundaries.
typescript
chunkingConfig: {
  whiteSpaceConfig: {
    maxTokensPerChunk: 400,   // 中等大小的分块(包含1-2个问答对)
    maxOverlapTokens: 40      // 10%的重叠
  }
}
最佳适用场景: FAQ、故障排除指南、操作教程
通用规则: 保持10%的重叠(重叠tokens数=分块大小/10),以避免分块边界处的上下文丢失。

Metadata Best Practices

元数据最佳实践

Design metadata schema for filtering and organization:
设计元数据结构以实现过滤和组织:

Example: Customer Support Knowledge Base

示例:客户支持知识库

typescript
customMetadata: {
  doc_type: 'faq' | 'manual' | 'troubleshooting' | 'guide',
  product: 'widget-pro' | 'widget-lite',
  version: '1.0' | '2.0',
  language: 'en' | 'es' | 'fr',
  category: 'installation' | 'configuration' | 'maintenance',
  priority: 'critical' | 'normal' | 'low',
  last_updated: '2025-01-15',
  author: 'support-team'
}
Query Example:
typescript
metadataFilter: 'product="widget-pro" AND (doc_type="troubleshooting" OR doc_type="faq") AND language="en"'
typescript
customMetadata: {
  doc_type: 'faq' | 'manual' | 'troubleshooting' | 'guide',
  product: 'widget-pro' | 'widget-lite',
  version: '1.0' | '2.0',
  language: 'en' | 'es' | 'fr',
  category: 'installation' | 'configuration' | 'maintenance',
  priority: 'critical' | 'normal' | 'low',
  last_updated: '2025-01-15',
  author: 'support-team'
}
查询示例:
typescript
metadataFilter: 'product="widget-pro" AND (doc_type="troubleshooting" OR doc_type="faq") AND language="en"'

Example: Legal Document Repository

示例:法律文档库

typescript
customMetadata: {
  doc_type: 'contract' | 'regulation' | 'case-law' | 'policy',
  jurisdiction: 'US' | 'EU' | 'UK',
  practice_area: 'employment' | 'corporate' | 'ip' | 'tax',
  effective_date: '2025-01-01',
  status: 'active' | 'archived',
  confidentiality: 'public' | 'internal' | 'privileged'
}
typescript
customMetadata: {
  doc_type: 'contract' | 'regulation' | 'case-law' | 'policy',
  jurisdiction: 'US' | 'EU' | 'UK',
  practice_area: 'employment' | 'corporate' | 'ip' | 'tax',
  effective_date: '2025-01-01',
  status: 'active' | 'archived',
  confidentiality: 'public' | 'internal' | 'privileged'
}

Example: Code Documentation

示例:代码文档库

typescript
customMetadata: {
  doc_type: 'api-reference' | 'tutorial' | 'example' | 'changelog',
  language: 'javascript' | 'python' | 'java' | 'go',
  framework: 'react' | 'nextjs' | 'express' | 'fastapi',
  version: '1.2.0',
  difficulty: 'beginner' | 'intermediate' | 'advanced'
}
Tips:
  • Use consistent key naming (
    snake_case
    or
    camelCase
    )
  • Limit to most important filterable fields (20 max)
  • Use enums/constants for values (easier filtering)
  • Include version and date fields for time-based filtering
typescript
customMetadata: {
  doc_type: 'api-reference' | 'tutorial' | 'example' | 'changelog',
  language: 'javascript' | 'python' | 'java' | 'go',
  framework: 'react' | 'nextjs' | 'express' | 'fastapi',
  version: '1.2.0',
  difficulty: 'beginner' | 'intermediate' | 'advanced'
}
提示:
  • 使用一致的键命名规则(
    snake_case
    camelCase
  • 限制为最重要的可过滤字段(最多20个)
  • 对值使用枚举/常量(便于过滤)
  • 包含版本和日期字段以支持基于时间的过滤

Cost Optimization

成本优化

1. Deduplicate Before Upload

1. 上传前去重

typescript
// Track uploaded file hashes to avoid duplicates
const uploadedHashes = new Set<string>()

async function uploadWithDeduplication(filePath: string) {
  const fileHash = await getFileHash(filePath)

  if (uploadedHashes.has(fileHash)) {
    console.log(`Skipping duplicate: ${filePath}`)
    return
  }

  await ai.fileSearchStores.uploadToFileSearchStore({
    name: fileStore.name,
    file: fs.createReadStream(filePath)
  })

  uploadedHashes.add(fileHash)
}
typescript
// 跟踪已上传文件的哈希值以避免重复
const uploadedHashes = new Set<string>()

async function uploadWithDeduplication(filePath: string) {
  const fileHash = await getFileHash(filePath)

  if (uploadedHashes.has(fileHash)) {
    console.log(`跳过重复文件:${filePath}`)
    return
  }

  await ai.fileSearchStores.uploadToFileSearchStore({
    name: fileStore.name,
    file: fs.createReadStream(filePath)
  })

  uploadedHashes.add(fileHash)
}

2. Compress Large Files

2. 压缩大文件

typescript
// Convert images to text before indexing (OCR)
// Compress PDFs (remove images, use text-only)
// Use markdown instead of Word docs (smaller size)
typescript
// 索引前将图片转换为文本(OCR)
// 压缩PDF(移除图片,仅保留文本)
// 使用markdown格式替代Word文档(更小的文件体积)

3. Use Metadata Filtering to Reduce Query Scope

3. 使用元数据过滤缩小查询范围

typescript
// ❌ EXPENSIVE: Search all 10GB of documents
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Reset procedure?',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [fileStore.name] } }]
  }
})

// ✅ CHEAPER: Filter to only troubleshooting docs (subset)
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Reset procedure?',
  config: {
    tools: [{
      fileSearch: {
        fileSearchStoreNames: [fileStore.name],
        metadataFilter: 'doc_type="troubleshooting"'  // Reduces search scope
      }
    }]
  }
})
typescript
// ❌ 成本高昂:搜索所有10GB的文档
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Reset procedure?',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [fileStore.name] } }]
  }
})

// ✅ 成本更低:仅搜索故障排除文档(子集)
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Reset procedure?',
  config: {
    tools: [{
      fileSearch: {
        fileSearchStoreNames: [fileStore.name],
        metadataFilter: 'doc_type="troubleshooting"'  // 缩小搜索范围
      }
    }]
  }
})

4. Choose Flash Over Pro for Cost Savings

4. 使用Flash模型替代Pro模型以节省成本

typescript
// Gemini 3 Flash is 10x cheaper than Pro for queries
// Use Flash unless you need Pro's advanced reasoning

// Development/testing: Use Flash
model: 'gemini-3-flash'

// Production (high-stakes answers): Use Pro
model: 'gemini-3-pro'
typescript
// Gemini 3 Flash的查询成本比Pro低10倍
// 除非需要Pro的高级推理能力,否则使用Flash

// 开发/测试环境:使用Flash
model: 'gemini-3-flash'

// 生产环境(高风险场景):使用Pro
model: 'gemini-3-pro'

5. Monitor Storage Usage

5. 监控存储使用情况

typescript
// List stores and estimate storage
const stores = await ai.fileSearchStores.list()

for (const store of stores.fileSearchStores || []) {
  const docs = await ai.fileSearchStores.documents.list({
    parent: store.name
  })

  console.log(`Store: ${store.displayName}`)
  console.log(`Documents: ${docs.documents?.length || 0}`)
  // Estimate storage (3x input size)
  console.log(`Estimated storage: ~${(docs.documents?.length || 0) * 10} MB`)
}
typescript
// 列出所有存储库并估算存储量
const stores = await ai.fileSearchStores.list()

for (const store of stores.fileSearchStores || []) {
  const docs = await ai.fileSearchStores.documents.list({
    parent: store.name
  })

  console.log(`存储库:${store.displayName}`)
  console.log(`文档数:${docs.documents?.length || 0}`)
  // 估算存储量(3倍于原始文件大小)
  console.log(`预估存储量:~${(docs.documents?.length || 0) * 10} MB`)
}

Testing & Verification

测试与验证

Verify Store Creation

验证存储库创建

typescript
const store = await ai.fileSearchStores.get({
  name: fileStore.name
})

console.assert(store.displayName === 'my-knowledge-base', 'Store name mismatch')
console.log('✅ Store created successfully')
typescript
const store = await ai.fileSearchStores.get({
  name: fileStore.name
})

console.assert(store.displayName === 'my-knowledge-base', '存储库名称不匹配')
console.log('✅ 存储库创建成功')

Verify Document Indexing

验证文档索引

typescript
const docs = await ai.fileSearchStores.documents.list({
  parent: fileStore.name
})

console.assert(docs.documents?.length > 0, 'No documents indexed')
console.log(`${docs.documents?.length} documents indexed`)
typescript
const docs = await ai.fileSearchStores.documents.list({
  parent: fileStore.name
})

console.assert(docs.documents?.length > 0, '未索引任何文档')
console.log(`✅ 已索引${docs.documents?.length}个文档`)

Verify Query Functionality

验证查询功能

typescript
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'What is this knowledge base about?',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [fileStore.name] } }]
  }
})

console.assert(response.text.length > 0, 'Empty response')
console.log('✅ Query successful:', response.text.substring(0, 100) + '...')
typescript
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'What is this knowledge base about?',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [fileStore.name] } }]
  }
})

console.assert(response.text.length > 0, '响应为空')
console.log('✅ 查询成功:', response.text.substring(0, 100) + '...')

Verify Citations

验证引用信息

typescript
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Provide a specific answer with citations.',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [fileStore.name] } }]
  }
})

const grounding = response.candidates[0].groundingMetadata
console.assert(
  grounding?.groundingChunks?.length > 0,
  'No grounding/citations returned'
)
console.log(`${grounding?.groundingChunks?.length} citations returned`)
typescript
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Provide a specific answer with citations.',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [fileStore.name] } }]
  }
})

const grounding = response.candidates[0].groundingMetadata
console.assert(
  grounding?.groundingChunks?.length > 0,
  '未返回grounding/引用信息'
)
console.log(`✅ 返回了${grounding?.groundingChunks?.length}条引用信息`)

Integration Examples

集成示例

Streaming Support

流式响应支持

File Search supports streaming responses with
generateContentStream()
:
typescript
// ✅ Streaming works with File Search (v1.34.0+)
const stream = await ai.models.generateContentStream({
  model: 'gemini-3-flash',
  contents: 'Summarize the document',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
  }
})

for await (const chunk of stream) {
  process.stdout.write(chunk.text)
}

// Access grounding after stream completes
const grounding = stream.candidates[0].groundingMetadata
Note: Early SDK versions (pre-v1.34.0) may have had streaming issues. Use v1.34.0+ for reliable streaming support.
File Search支持通过
generateContentStream()
获取流式响应:
typescript
// ✅ 流式响应与File Search兼容(v1.34.0+)
const stream = await ai.models.generateContentStream({
  model: 'gemini-3-flash',
  contents: 'Summarize the document',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
  }
})

for await (const chunk of stream) {
  process.stdout.write(chunk.text)
}

// 流式响应完成后访问grounding信息
const grounding = stream.candidates[0].groundingMetadata
注意: 早期SDK版本(v1.34.0之前)可能存在流式响应问题。使用v1.34.0+以获得可靠的流式支持。

Working Templates

可用模板

This skill includes 3 working templates in the
templates/
directory:
本技能在
templates/
目录中包含3个可用模板:

Template 1: basic-node-rag

模板1:basic-node-rag

Minimal Node.js/TypeScript example demonstrating:
  • Create file search store
  • Upload multiple documents
  • Query with natural language
  • Display citations
Use when: Learning File Search, prototyping, simple CLI tools
Run:
bash
cd templates/basic-node-rag
npm install
npm run dev
极简的Node.js/TypeScript示例,演示:
  • 创建文件搜索存储库
  • 上传多个文档
  • 自然语言查询
  • 显示引用信息
适用场景: 学习File Search、原型开发、简单CLI工具
运行:
bash
cd templates/basic-node-rag
npm install
npm run dev

Template 2: cloudflare-worker-rag

模板2:cloudflare-worker-rag

Cloudflare Workers integration showing:
  • Edge API for document upload
  • Edge API for semantic search
  • Integration with R2 for document storage
  • Hybrid architecture (Gemini File Search + Cloudflare edge)
Use when: Building global edge applications, integrating with Cloudflare stack
Deploy:
bash
cd templates/cloudflare-worker-rag
npm install
npx wrangler deploy
Cloudflare Workers集成示例,展示:
  • 用于文档上传的边缘API
  • 用于语义搜索的边缘API
  • 与R2存储的集成
  • 混合架构(Gemini File Search + Cloudflare边缘)
适用场景: 构建全球边缘应用、与Cloudflare技术栈集成
部署:
bash
cd templates/cloudflare-worker-rag
npm install
npx wrangler deploy

Template 3: nextjs-docs-search

模板3:nextjs-docs-search

Full-stack Next.js application featuring:
  • Document upload UI with drag-and-drop
  • Real-time search interface
  • Citation rendering with source links
  • Metadata filtering UI
Use when: Building production documentation sites, knowledge bases
Run:
bash
cd templates/nextjs-docs-search
npm install
npm run dev
全栈Next.js应用,功能包括:
  • 支持拖拽的文档上传UI
  • 实时搜索界面
  • 带源链接的引用渲染
  • 元数据过滤UI
适用场景: 构建生产级文档站点、知识库
运行:
bash
cd templates/nextjs-docs-search
npm install
npm run dev

References

参考资料

Official Documentation:
Tutorials:
Bundled Resources in This Skill:
  • references/api-reference.md
    - Complete API documentation
  • references/chunking-best-practices.md
    - Detailed chunking strategies
  • references/pricing-calculator.md
    - Cost estimation guide
  • references/migration-from-openai.md
    - Migration guide from OpenAI Files API
  • scripts/create-store.ts
    - CLI tool to create stores
  • scripts/upload-batch.ts
    - Batch upload script
  • scripts/query-store.ts
    - Interactive query tool
  • scripts/cleanup.ts
    - Cleanup script
Working Templates:
  • templates/basic-node-rag/
    - Minimal Node.js example
  • templates/cloudflare-worker-rag/
    - Edge deployment example
  • templates/nextjs-docs-search/
    - Full-stack Next.js app

Skill Version: 1.1.0 Last Verified: 2026-01-21 Package Version: @google/genai ^1.38.0 (minimum 1.29.0 required) Token Savings: ~67% Errors Prevented: 12 Changes: Added 4 new errors from community research (displayName Blob issue, grounding with JSON mode, tool conflicts, batch API metadata), enhanced polling timeout pattern with fallback verification, added streaming support note
官方文档:
教程:
本技能包含的资源:
  • references/api-reference.md
    - 完整的API文档
  • references/chunking-best-practices.md
    - 详细的分块策略
  • references/pricing-calculator.md
    - 成本估算指南
  • references/migration-from-openai.md
    - 从OpenAI Files API迁移的指南
  • scripts/create-store.ts
    - 创建存储库的CLI工具
  • scripts/upload-batch.ts
    - 批量上传脚本
  • scripts/query-store.ts
    - 交互式查询工具
  • scripts/cleanup.ts
    - 资源清理脚本
可用模板:
  • templates/basic-node-rag/
    - 极简Node.js示例
  • templates/cloudflare-worker-rag/
    - 边缘部署示例
  • templates/nextjs-docs-search/
    - 全栈Next.js应用

技能版本: 1.1.0 最后验证日期: 2026-01-21 依赖包版本: @google/genai ^1.38.0(最低要求1.29.0) Token节省率: ~67% 可预防错误数: 12 更新内容: 新增4个社区反馈的错误(Blob上传displayName问题、JSON模式grounding丢失、工具冲突、批量API元数据问题),增强带回退验证的轮询超时模式,添加流式响应支持说明