google-gemini-file-search

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Google Gemini File Search Setup

Google Gemini File Search 搭建指南

Overview

概述

Google Gemini File Search is a fully managed RAG system. Upload documents (100+ formats: PDF, Word, Excel, code) and query with natural language—automatic chunking, embeddings, semantic search, and citations.

What This Skill Provides:

Complete @google/genai File Search API setup
8 documented errors with prevention strategies
Chunking best practices for optimal retrieval
Cost optimization ($0.15/1M tokens indexing, 3x storage multiplier)
Cloudflare Workers + Next.js integration templates

Google Gemini File Search是一个全托管的RAG系统。你可以上传100多种格式的文档（PDF、Word、Excel、代码文件等），并通过自然语言进行查询，系统会自动完成分块、嵌入、语义搜索和引用标注。

本技能提供的内容：

完整的@google/genai File Search API搭建步骤
8种已记录的错误及预防策略
实现最优检索效果的分块最佳实践
成本优化方案（索引费用为$0.15/1M tokens，存储量为原文件的3倍）
Cloudflare Workers + Next.js集成模板

Prerequisites

前置条件

1. Google AI API Key

1. Google AI API密钥

Create an API key at https://aistudio.google.com/apikey

Free Tier Limits:

1 GB storage (total across all file search stores)
1,500 requests per day
1 million tokens per minute

Paid Tier Pricing:

Indexing: $0.15 per 1M input tokens (one-time)
Storage: Free (Tier 1: 10 GB, Tier 2: 100 GB, Tier 3: 1 TB)
Query-time embeddings: Free (retrieved context counts as input tokens)

在https://aistudio.google.com/apikey创建API密钥

免费版限制：

1 GB存储空间（所有文件搜索存储库的总容量）
每日1500次请求
每分钟100万tokens

付费版定价：

索引：每100万输入tokens收费$0.15（一次性费用）
存储：免费（Tier 1：10 GB，Tier 2：100 GB，Tier 3：1 TB）
查询时嵌入：免费（检索到的上下文计入输入tokens）

2. Node.js Environment

2. Node.js环境

Minimum Version: Node.js 18+ (v20+ recommended)

bash

node --version  # Should be >=18.0.0

最低版本： Node.js 18+（推荐v20+）

bash

node --version  # 版本应>=18.0.0

3. Install @google/genai SDK

3. 安装@google/genai SDK

bash

npm install @google/genai

bash

npm install @google/genai

or

或

pnpm add @google/genai

or

或

yarn add @google/genai


**Current Stable Version:** 1.30.0+ (verify with `npm view @google/genai version`)

**⚠️ Important:** File Search API requires **@google/genai v1.29.0 or later**. Earlier versions do not support File Search. The API was added in v1.29.0 (November 5, 2025).

yarn add @google/genai


**当前稳定版本：** 1.30.0+（可通过`npm view @google/genai version`验证）

**⚠️ 重要提示：** File Search API需要**@google/genai v1.29.0或更高版本**。早期版本不支持File Search。该API在v1.29.0版本（2025年11月5日）中新增。

4. TypeScript Configuration (Optional but Recommended)

4. TypeScript配置（可选但推荐）

json

{
  "compilerOptions": {
    "target": "ES2020",
    "module": "ESNext",
    "moduleResolution": "node",
    "esModuleInterop": true,
    "strict": true,
    "skipLibCheck": true
  }
}

json

{
  "compilerOptions": {
    "target": "ES2020",
    "module": "ESNext",
    "moduleResolution": "node",
    "esModuleInterop": true,
    "strict": true,
    "skipLibCheck": true
  }
}

Common Errors Prevented

可预防的常见错误

This skill prevents 12 common errors encountered when implementing File Search:

本技能可帮助你避免在实现File Search时遇到的12种常见错误：

Error 1: Document Immutability

错误1：文档不可修改

Symptom:

Error: Documents cannot be modified after indexing

Cause: Documents are immutable once indexed. There is no PATCH or UPDATE operation.

Prevention: Use the delete+re-upload pattern for updates:

typescript

// ❌ WRONG: Trying to update document (no such API)
await ai.fileSearchStores.documents.update({
  name: documentName,
  customMetadata: { version: '2.0' }
})

// ✅ CORRECT: Delete then re-upload
const docs = await ai.fileSearchStores.documents.list({
  parent: fileStore.name
})

const oldDoc = docs.documents.find(d => d.displayName === 'manual.pdf')
if (oldDoc) {
  await ai.fileSearchStores.documents.delete({
    name: oldDoc.name,
    force: true
  })
}

await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('manual-v2.pdf'),
  config: { displayName: 'manual.pdf' }
})

Source: https://ai.google.dev/api/file-search/documents

症状：

Error: Documents cannot be modified after indexing

原因： 文档一旦完成索引就无法修改，不存在PATCH或UPDATE操作。

预防方案： 使用“删除+重新上传”的模式来更新文档：

typescript

// ❌ 错误做法：尝试更新文档（无此API）
await ai.fileSearchStores.documents.update({
  name: documentName,
  customMetadata: { version: '2.0' }
})

// ✅ 正确做法：先删除再重新上传
const docs = await ai.fileSearchStores.documents.list({
  parent: fileStore.name
})

const oldDoc = docs.documents.find(d => d.displayName === 'manual.pdf')
if (oldDoc) {
  await ai.fileSearchStores.documents.delete({
    name: oldDoc.name,
    force: true
  })
}

await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('manual-v2.pdf'),
  config: { displayName: 'manual.pdf' }
})

来源： https://ai.google.dev/api/file-search/documents

Error 2: Storage Quota Exceeded

错误2：存储配额超限

Symptom:

Error: Quota exceeded. Expected 1GB limit, but 3.2GB used.

Cause: Storage calculation includes input files + embeddings + metadata. Total storage ≈ 3x input size.

Prevention: Calculate storage before upload:

typescript

// ❌ WRONG: Assuming storage = file size
const fileSize = fs.statSync('data.pdf').size // 500 MB
// Expect 500 MB usage → WRONG

// ✅ CORRECT: Account for 3x multiplier
const fileSize = fs.statSync('data.pdf').size // 500 MB
const estimatedStorage = fileSize * 3 // 1.5 GB (embeddings + metadata)
console.log(`Estimated storage: ${estimatedStorage / 1e9} GB`)

// Check if within quota before upload
if (estimatedStorage > 1e9) {
  console.warn('⚠️ File may exceed free tier 1 GB limit')
}

Source: https://blog.google/technology/developers/file-search-gemini-api/

症状：

Error: Quota exceeded. Expected 1GB limit, but 3.2GB used.

原因： 存储量计算包含原始文件、嵌入数据和元数据，总存储量约为原始文件大小的3倍。

预防方案： 在上传前计算预估存储量：

typescript

// ❌ 错误做法：假设存储量等于文件大小
const fileSize = fs.statSync('data.pdf').size // 500 MB
// 预期占用500 MB → 错误

// ✅ 正确做法：考虑3倍的存储乘数
const fileSize = fs.statSync('data.pdf').size // 500 MB
const estimatedStorage = fileSize * 3 // 1.5 GB（包含嵌入数据和元数据）
console.log(`预估存储量：${estimatedStorage / 1e9} GB`)

// 上传前检查是否在配额范围内
if (estimatedStorage > 1e9) {
  console.warn('⚠️ 文件可能超出免费版1 GB的存储限制')
}

来源： https://blog.google/technology/developers/file-search-gemini-api/

Error 3: Incorrect Chunking Configuration

错误3：分块配置不正确

Symptom: Poor retrieval quality, irrelevant results, or context cutoff mid-sentence.

Cause: Default chunking may not be optimal for your content type.

Prevention: Use recommended chunking strategy:

typescript

// ❌ WRONG: Using defaults without testing
await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('docs.pdf')
  // Default chunking may be too large or too small
})

// ✅ CORRECT: Configure chunking for precision
await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('docs.pdf'),
  config: {
    chunkingConfig: {
      whiteSpaceConfig: {
        maxTokensPerChunk: 500,  // Smaller chunks = more precise retrieval
        maxOverlapTokens: 50     // 10% overlap prevents context loss
      }
    }
  }
})

Chunking Guidelines:

Technical docs/code: 500 tokens/chunk, 50 overlap
Prose/articles: 800 tokens/chunk, 80 overlap
Legal/contracts: 300 tokens/chunk, 30 overlap (high precision)

Source: https://www.philschmid.de/gemini-file-search-javascript

症状： 检索质量差、返回无关结果，或上下文在句子中途被截断。

原因： 默认分块配置可能并不适用于你的内容类型。

预防方案： 使用推荐的分块策略：

typescript

// ❌ 错误做法：直接使用默认配置而不测试
await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('docs.pdf')
  // 默认分块可能过大或过小
})

// ✅ 正确做法：配置分块以提高检索精度
await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('docs.pdf'),
  config: {
    chunkingConfig: {
      whiteSpaceConfig: {
        maxTokensPerChunk: 500,  // 较小的分块 = 更精准的检索
        maxOverlapTokens: 50     // 10%的重叠可避免上下文丢失
      }
    }
  }
})

分块指南：

技术文档/代码： 500 tokens/分块，50个重叠tokens
散文/文章： 800 tokens/分块，80个重叠tokens
法律/合同文件： 300 tokens/分块，30个重叠tokens（高精度需求）

来源： https://www.philschmid.de/gemini-file-search-javascript

Error 4: Metadata Limits Exceeded

错误4：元数据超出限制

Symptom:

Error: Maximum 20 custom metadata key-value pairs allowed

Cause: Each document can have at most 20 metadata fields.

Prevention: Design compact metadata schema:

typescript

// ❌ WRONG: Too many metadata fields
await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('doc.pdf'),
  config: {
    customMetadata: {
      doc_type: 'manual',
      version: '1.0',
      author: 'John Doe',
      department: 'Engineering',
      created_date: '2025-01-01',
      // ... 18 more fields → Error!
    }
  }
})

// ✅ CORRECT: Use hierarchical keys or JSON strings
await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('doc.pdf'),
  config: {
    customMetadata: {
      doc_type: 'manual',
      version: '1.0',
      author_dept: 'John Doe|Engineering',  // Combine related fields
      dates: JSON.stringify({                // Or use JSON for complex data
        created: '2025-01-01',
        updated: '2025-01-15'
      })
    }
  }
})

Source: https://ai.google.dev/api/file-search/documents

症状：

Error: Maximum 20 custom metadata key-value pairs allowed

原因： 每个文档最多只能包含20个元数据字段。

预防方案： 设计紧凑的元数据结构：

typescript

// ❌ 错误做法：元数据字段过多
await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('doc.pdf'),
  config: {
    customMetadata: {
      doc_type: 'manual',
      version: '1.0',
      author: 'John Doe',
      department: 'Engineering',
      created_date: '2025-01-01',
      // ... 还有18个字段 → 报错！
    }
  }
})

// ✅ 正确做法：使用分层键或JSON字符串
await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('doc.pdf'),
  config: {
    customMetadata: {
      doc_type: 'manual',
      version: '1.0',
      author_dept: 'John Doe|Engineering',  // 合并相关字段
      dates: JSON.stringify({                // 或使用JSON存储复杂数据
        created: '2025-01-01',
        updated: '2025-01-15'
      })
    }
  }
})

来源： https://ai.google.dev/api/file-search/documents

Error 5: Indexing Cost Surprises

错误5：索引成本超出预期

Symptom: Unexpected bill for $375 after uploading 10 GB of documents.

Cause: Indexing costs are one-time but calculated per input token ($0.15/1M tokens).

Prevention: Estimate costs before indexing:

typescript

// ❌ WRONG: No cost estimation
await uploadAllDocuments(fileStore.name, './data') // 10 GB uploaded → $375 surprise

// ✅ CORRECT: Calculate costs upfront
const totalSize = getTotalDirectorySize('./data') // 10 GB
const estimatedTokens = (totalSize / 4) // Rough estimate: 1 token ≈ 4 bytes
const indexingCost = (estimatedTokens / 1e6) * 0.15

console.log(`Estimated indexing cost: $${indexingCost.toFixed(2)}`)
console.log(`Estimated storage: ${(totalSize * 3) / 1e9} GB`)

// Confirm before proceeding
const proceed = await confirm(`Proceed with indexing? Cost: $${indexingCost.toFixed(2)}`)
if (proceed) {
  await uploadAllDocuments(fileStore.name, './data')
}

Cost Examples:

1 GB text ≈ 250M tokens = $37.50 indexing
100 MB PDF ≈ 25M tokens = $3.75 indexing
10 MB code ≈ 2.5M tokens = $0.38 indexing

Source: https://ai.google.dev/pricing

症状： 上传10 GB文档后收到$375的意外账单。

原因： 索引费用为一次性收费，按输入tokens计算（$0.15/1M tokens）。

预防方案： 在索引前预估成本：

typescript

// ❌ 错误做法：不进行成本预估
await uploadAllDocuments(fileStore.name, './data') // 上传10 GB文档 → 收到$375的意外账单

// ✅ 正确做法：提前计算成本
const totalSize = getTotalDirectorySize('./data') // 10 GB
const estimatedTokens = (totalSize / 4) // 粗略估算：1 token ≈ 4字节
const indexingCost = (estimatedTokens / 1e6) * 0.15

console.log(`预估索引成本：$${indexingCost.toFixed(2)}`)
console.log(`预估存储量：${(totalSize * 3) / 1e9} GB`)

// 确认后再继续
const proceed = await confirm(`是否继续索引？成本：$${indexingCost.toFixed(2)}`)
if (proceed) {
  await uploadAllDocuments(fileStore.name, './data')
}

成本示例：

1 GB文本 ≈ 2.5亿tokens = $37.50索引费用
100 MB PDF ≈ 2500万tokens = $3.75索引费用
10 MB代码 ≈ 250万tokens = $0.38索引费用

来源： https://ai.google.dev/pricing

Error 6: Not Polling Operation Status

错误6：未轮询操作状态

Symptom: Query returns no results immediately after upload, or incomplete indexing.

Cause: File uploads are processed asynchronously. Must poll operation until

done: true

Prevention: Always poll operation status with timeout and fallback:

typescript

// ❌ WRONG: Assuming upload is instant
const operation = await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('large.pdf')
})
// Immediately query → No results!

// ✅ CORRECT: Poll until indexing complete with timeout
const operation = await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('large.pdf')
})

// Poll with timeout and fallback
const MAX_POLL_TIME = 60000 // 60 seconds
const POLL_INTERVAL = 1000
let elapsed = 0

while (!operation.done && elapsed < MAX_POLL_TIME) {
  await new Promise(resolve => setTimeout(resolve, POLL_INTERVAL))
  elapsed += POLL_INTERVAL

  try {
    operation = await ai.operations.get({ name: operation.name })
    console.log(`Indexing progress: ${operation.metadata?.progress || 'processing...'}`)
  } catch (error) {
    console.warn('Polling failed, assuming complete:', error)
    break
  }
}

if (operation.error) {
  throw new Error(`Indexing failed: ${operation.error.message}`)
}

// ⚠️ Warning: operations.get() can be unreliable for large files
// If timeout reached, verify document exists manually
if (elapsed >= MAX_POLL_TIME) {
  console.warn('Polling timeout - verifying document manually')
  const docs = await ai.fileSearchStores.documents.list({ parent: fileStore.name })
  const uploaded = docs.documents?.find(d => d.displayName === 'large.pdf')
  if (uploaded) {
    console.log('✅ Document found despite polling timeout')
  } else {
    throw new Error('Upload failed - document not found')
  }
}

console.log('✅ Indexing complete:', operation.response?.displayName)

Source: https://ai.google.dev/api/file-search/file-search-stores#uploadtofilesearchstore, GitHub Issue #1211

症状： 上传文档后立即查询无结果，或索引不完整。

原因： 文件上传是异步处理的，必须轮询操作状态直到

done: true

。

预防方案： 始终带超时和回退机制轮询操作状态：

typescript

// ❌ 错误做法：假设上传立即完成
const operation = await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('large.pdf')
})
// 立即查询 → 无结果！

// ✅ 正确做法：轮询直到索引完成，并设置超时
const operation = await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('large.pdf')
})

// 带超时和回退的轮询
const MAX_POLL_TIME = 60000 // 60秒
const POLL_INTERVAL = 1000
let elapsed = 0

while (!operation.done && elapsed < MAX_POLL_TIME) {
  await new Promise(resolve => setTimeout(resolve, POLL_INTERVAL))
  elapsed += POLL_INTERVAL

  try {
    operation = await ai.operations.get({ name: operation.name })
    console.log(`索引进度：${operation.metadata?.progress || '处理中...'}`)
  } catch (error) {
    console.warn('轮询失败，假设索引已完成：', error)
    break
  }
}

if (operation.error) {
  throw new Error(`索引失败：${operation.error.message}`)
}

// ⚠️ 注意：对于大文件，operations.get()可能不可靠
// 如果超时，手动验证文档是否存在
if (elapsed >= MAX_POLL_TIME) {
  console.warn('轮询超时 - 手动验证文档')
  const docs = await ai.fileSearchStores.documents.list({ parent: fileStore.name })
  const uploaded = docs.documents?.find(d => d.displayName === 'large.pdf')
  if (uploaded) {
    console.log('✅ 尽管轮询超时，但文档已找到')
  } else {
    throw new Error('上传失败 - 未找到文档')
  }
}

console.log('✅ 索引完成：', operation.response?.displayName)

来源： https://ai.google.dev/api/file-search/file-search-stores#uploadtofilesearchstore, GitHub Issue #1211

Error 7: Forgetting Force Delete

错误7：忘记使用强制删除

Symptom:

Error: Cannot delete store with documents. Set force=true.

Cause: Stores with documents require

force: true

to delete (prevents accidental deletion).

Prevention: Always use

force: true

when deleting non-empty stores:

typescript

// ❌ WRONG: Trying to delete store with documents
await ai.fileSearchStores.delete({
  name: fileStore.name
})
// Error: Cannot delete store with documents

// ✅ CORRECT: Use force delete
await ai.fileSearchStores.delete({
  name: fileStore.name,
  force: true  // Deletes store AND all documents
})

// Alternative: Delete documents first
const docs = await ai.fileSearchStores.documents.list({ parent: fileStore.name })
for (const doc of docs.documents || []) {
  await ai.fileSearchStores.documents.delete({
    name: doc.name,
    force: true
  })
}
await ai.fileSearchStores.delete({ name: fileStore.name })

Source: https://ai.google.dev/api/file-search/file-search-stores#delete

症状：

Error: Cannot delete store with documents. Set force=true.

原因： 包含文档的存储库需要设置

force: true

才能删除（防止误删）。

预防方案： 删除非空存储库时始终使用

force: true

：

typescript

// ❌ 错误做法：尝试删除包含文档的存储库
await ai.fileSearchStores.delete({
  name: fileStore.name
})
// 报错：Cannot delete store with documents

// ✅ 正确做法：使用强制删除
await ai.fileSearchStores.delete({
  name: fileStore.name,
  force: true  // 删除存储库及所有文档
})

// 替代方案：先删除所有文档
const docs = await ai.fileSearchStores.documents.list({ parent: fileStore.name })
for (const doc of docs.documents || []) {
  await ai.fileSearchStores.documents.delete({
    name: doc.name,
    force: true
  })
}
await ai.fileSearchStores.delete({ name: fileStore.name })

来源： https://ai.google.dev/api/file-search/file-search-stores#delete

Error 8: Using Unsupported Models

错误8：使用不支持的模型

Symptom:

Error: File Search is only supported for Gemini 3 Pro and Flash models

Cause: File Search requires Gemini 3 Pro or Gemini 3 Flash. Gemini 2.x and 1.5 models are not supported.

Prevention: Always use Gemini 3 models:

typescript

// ❌ WRONG: Using Gemini 1.5 model
const response = await ai.models.generateContent({
  model: 'gemini-1.5-pro',  // Not supported!
  contents: 'What is the installation procedure?',
  config: {
    tools: [{
      fileSearch: { fileSearchStoreNames: [fileStore.name] }
    }]
  }
})

// ✅ CORRECT: Use Gemini 3 models
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',  // ✅ Supported (fast, cost-effective)
  // OR
  // model: 'gemini-3-pro',   // ✅ Supported (higher quality)
  contents: 'What is the installation procedure?',
  config: {
    tools: [{
      fileSearch: { fileSearchStoreNames: [fileStore.name] }
    }]
  }
})

Source: https://ai.google.dev/gemini-api/docs/file-search

症状：

Error: File Search is only supported for Gemini 3 Pro and Flash models

原因： File Search仅支持Gemini 3 Pro和Gemini 3 Flash模型，不支持Gemini 2.x和1.5模型。

预防方案： 始终使用Gemini 3系列模型：

typescript

// ❌ 错误做法：使用Gemini 1.5模型
const response = await ai.models.generateContent({
  model: 'gemini-1.5-pro',  // 不支持！
  contents: 'What is the installation procedure?',
  config: {
    tools: [{
      fileSearch: { fileSearchStoreNames: [fileStore.name] }
    }]
  }
})

// ✅ 正确做法：使用Gemini 3模型
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',  // ✅ 支持（速度快、成本低）
  // 或
  // model: 'gemini-3-pro',   // ✅ 支持（质量更高）
  contents: 'What is the installation procedure?',
  config: {
    tools: [{
      fileSearch: { fileSearchStoreNames: [fileStore.name] }
    }]
  }
})

来源： https://ai.google.dev/gemini-api/docs/file-search

Error 9: displayName Not Preserved for Blob Sources (Fixed v1.34.0+)

错误9：Blob源文件的displayName未被保留（v1.34.0+已修复）

Symptom:

groundingChunks[0].title === null  // No document source shown

Cause: In @google/genai versions prior to v1.34.0, when uploading files as

Blob

objects (not file paths), the SDK dropped the

displayName

and

customMetadata

configuration fields.

Prevention:

typescript

// ✅ CORRECT: Upgrade to v1.34.0+ for automatic fix
npm install @google/genai@latest  // v1.34.0+

await ai.fileSearchStores.uploadToFileSearchStore({
  name: storeName,
  file: new Blob([arrayBuffer], { type: 'application/pdf' }),
  config: {
    displayName: 'Safety Manual.pdf',  // ✅ Now preserved
    customMetadata: { version: '1.0' }  // ✅ Now preserved
  }
})

// ⚠️ WORKAROUND for v1.33.0 and earlier: Use resumable upload
const uploadUrl = `https://generativelanguage.googleapis.com/upload/v1beta/${storeName}:uploadToFileSearchStore?key=${API_KEY}`

// Step 1: Initiate with displayName in body
const initResponse = await fetch(uploadUrl, {
  method: 'POST',
  headers: {
    'X-Goog-Upload-Protocol': 'resumable',
    'X-Goog-Upload-Command': 'start',
    'X-Goog-Upload-Header-Content-Length': numBytes.toString(),
    'X-Goog-Upload-Header-Content-Type': 'application/pdf',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    displayName: 'Safety Manual.pdf'  // ✅ Works with resumable upload
  })
})

// Step 2: Upload file bytes
const uploadUrl2 = initResponse.headers.get('X-Goog-Upload-URL')
await fetch(uploadUrl2, {
  method: 'PUT',
  headers: {
    'Content-Length': numBytes.toString(),
    'X-Goog-Upload-Offset': '0',
    'X-Goog-Upload-Command': 'upload, finalize',
    'Content-Type': 'application/pdf'
  },
  body: fileBytes
})

Source: GitHub Issue #1078

症状：

groundingChunks[0].title === null  // 未显示文档来源

原因： 在@google/genai v1.34.0之前的版本中，当以

Blob

对象（而非文件路径）上传文件时，SDK会丢弃

displayName

和

customMetadata

配置字段。

预防方案：

typescript

// ✅ 正确做法：升级到v1.34.0+以自动修复
npm install @google/genai@latest  // v1.34.0+

await ai.fileSearchStores.uploadToFileSearchStore({
  name: storeName,
  file: new Blob([arrayBuffer], { type: 'application/pdf' }),
  config: {
    displayName: 'Safety Manual.pdf',  // ✅ 现在会被保留
    customMetadata: { version: '1.0' }  // ✅ 现在会被保留
  }
})

// ⚠️ v1.33.0及更早版本的解决方法：使用可恢复上传
const uploadUrl = `https://generativelanguage.googleapis.com/upload/v1beta/${storeName}:uploadToFileSearchStore?key=${API_KEY}`

// 步骤1：在请求体中传入displayName以初始化
const initResponse = await fetch(uploadUrl, {
  method: 'POST',
  headers: {
    'X-Goog-Upload-Protocol': 'resumable',
    'X-Goog-Upload-Command': 'start',
    'X-Goog-Upload-Header-Content-Length': numBytes.toString(),
    'X-Goog-Upload-Header-Content-Type': 'application/pdf',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    displayName: 'Safety Manual.pdf'  // ✅ 可恢复上传支持此配置
  })
})

// 步骤2：上传文件字节
const uploadUrl2 = initResponse.headers.get('X-Goog-Upload-URL')
await fetch(uploadUrl2, {
  method: 'PUT',
  headers: {
    'Content-Length': numBytes.toString(),
    'X-Goog-Upload-Offset': '0',
    'X-Goog-Upload-Command': 'upload, finalize',
    'Content-Type': 'application/pdf'
  },
  body: fileBytes
})

来源： GitHub Issue #1078

Error 10: Grounding Metadata Ignored with JSON Response Mode

错误10：JSON响应模式下Grounding元数据被忽略

Symptom:

response.candidates[0].groundingMetadata === undefined
// Even though fileSearch tool is configured

Cause: When using

responseMimeType: 'application/json'

for structured output, the API ignores the

fileSearch

tool and returns no grounding metadata, even with Gemini 3 models.

Prevention:

typescript

// ❌ WRONG: Structured output overrides grounding
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Summarize guidelines',
  config: {
    responseMimeType: 'application/json',  // Loses grounding
    tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
  }
})

// ✅ CORRECT: Two-step approach
// Step 1: Get grounded text response
const textResponse = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Summarize guidelines',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
  }
})

const grounding = textResponse.candidates[0].groundingMetadata

// Step 2: Convert to structured format in prompt
const jsonResponse = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: `Convert to JSON: ${textResponse.text}

Format:
{
  "summary": "...",
  "key_points": ["..."]
}`,
  config: {
    responseMimeType: 'application/json',
    responseSchema: {
      type: 'object',
      properties: {
        summary: { type: 'string' },
        key_points: { type: 'array', items: { type: 'string' } }
      }
    }
  }
})

// Combine results
const result = {
  data: JSON.parse(jsonResponse.text),
  sources: grounding.groundingChunks
}

Source: GitHub Issue #829

症状：

response.candidates[0].groundingMetadata === undefined
// 即使已配置fileSearch工具

原因： 当使用

responseMimeType: 'application/json'

获取结构化输出时，API会忽略

fileSearch

工具，且不会返回grounding元数据，即使使用Gemini 3模型也是如此。

预防方案：

typescript

// ❌ 错误做法：结构化输出会覆盖grounding信息
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Summarize guidelines',
  config: {
    responseMimeType: 'application/json',  // 丢失grounding信息
    tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
  }
})

// ✅ 正确做法：分两步实现
// 步骤1：获取带grounding的文本响应
const textResponse = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Summarize guidelines',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
  }
})

const grounding = textResponse.candidates[0].groundingMetadata

// 步骤2：在提示词中要求转换为结构化格式
const jsonResponse = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: `Convert to JSON: ${textResponse.text}

Format:
{
  "summary": "...",
  "key_points": ["..."]
}`,
  config: {
    responseMimeType: 'application/json',
    responseSchema: {
      type: 'object',
      properties: {
        summary: { type: 'string' },
        key_points: { type: 'array', items: { type: 'string' } }
      }
    }
  }
})

// 合并结果
const result = {
  data: JSON.parse(jsonResponse.text),
  sources: grounding.groundingChunks
}

来源： GitHub Issue #829

Error 11: Google Search and File Search Tools Are Mutually Exclusive

错误11：Google Search与File Search工具互斥

Symptom:

Error: "Search as a tool and file search tool are not supported together"
Status: INVALID_ARGUMENT

Cause: The Gemini API does not allow using

googleSearch

and

fileSearch

tools in the same request.

Prevention:

typescript

// ❌ WRONG: Combining search tools
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'What are the latest industry guidelines?',
  config: {
    tools: [
      { googleSearch: {} },
      { fileSearch: { fileSearchStoreNames: [storeName] } }
    ]
  }
})

// ✅ CORRECT: Use separate specialist agents
async function searchWeb(query: string) {
  return ai.models.generateContent({
    model: 'gemini-3-flash',
    contents: query,
    config: { tools: [{ googleSearch: {} }] }
  })
}

async function searchDocuments(query: string) {
  return ai.models.generateContent({
    model: 'gemini-3-flash',
    contents: query,
    config: { tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }] }
  })
}

// Orchestrate based on query type
const needsWeb = query.includes('latest') || query.includes('current')
const response = needsWeb
  ? await searchWeb(query)
  : await searchDocuments(query)

Source: GitHub Issue #435, Google Codelabs

症状：

Error: "Search as a tool and file search tool are not supported together"
Status: INVALID_ARGUMENT

原因： Gemini API不允许在同一请求中同时使用

googleSearch

和

fileSearch

工具。

预防方案：

typescript

// ❌ 错误做法：同时使用两种搜索工具
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'What are the latest industry guidelines?',
  config: {
    tools: [
      { googleSearch: {} },
      { fileSearch: { fileSearchStoreNames: [storeName] } }
    ]
  }
})

// ✅ 正确做法：使用独立的专用代理
async function searchWeb(query: string) {
  return ai.models.generateContent({
    model: 'gemini-3-flash',
    contents: query,
    config: { tools: [{ googleSearch: {} }] }
  })
}

async function searchDocuments(query: string) {
  return ai.models.generateContent({
    model: 'gemini-3-flash',
    contents: query,
    config: { tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }] }
  })
}

// 根据查询类型选择对应的工具
const needsWeb = query.includes('latest') || query.includes('current')
const response = needsWeb
  ? await searchWeb(query)
  : await searchDocuments(query)

来源： GitHub Issue #435, Google Codelabs

Error 12: Batch API Missing Response Metadata (Community-sourced)

错误12：批量API缺少响应元数据（社区反馈）

Symptom: Cannot correlate batch responses with requests when using metadata field.

Cause: When using Batch API with

InlinedRequest

that includes a

metadata

field, the corresponding

InlinedResponse

does not return the metadata.

Prevention:

typescript

// ❌ WRONG: Expecting metadata in response
const batchRequest = {
  metadata: { key: 'my-request-id' },
  contents: [{ parts: [{ text: 'Question?' }], role: 'user' }],
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
  }
}

const batchResponse = await ai.batch.create({ requests: [batchRequest] })
console.log(batchResponse.responses[0].metadata)  // ❌ undefined

// ✅ CORRECT: Use array index to correlate
const requests = [
  { metadata: { id: 'req-1' }, contents: [...] },
  { metadata: { id: 'req-2' }, contents: [...] }
]

const responses = await ai.batch.create({ requests })

// Map by index (not ideal but works)
responses.responses.forEach((response, i) => {
  const requestMetadata = requests[i].metadata
  console.log(`Response for ${requestMetadata.id}:`, response)
})

Community Verification: Maintainer confirmed, internal bug filed.

Source: GitHub Issue #1191

症状： 使用批量API时，无法通过metadata字段将响应与请求关联起来。

原因： 当使用包含

metadata

字段的

InlinedRequest

调用批量API时，对应的

InlinedResponse

不会返回该metadata。

预防方案：

typescript

// ❌ 错误做法：期望在响应中获取metadata
const batchRequest = {
  metadata: { key: 'my-request-id' },
  contents: [{ parts: [{ text: 'Question?' }], role: 'user' }],
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
  }
}

const batchResponse = await ai.batch.create({ requests: [batchRequest] })
console.log(batchResponse.responses[0].metadata)  // ❌ undefined

// ✅ 正确做法：使用数组索引进行关联
const requests = [
  { metadata: { id: 'req-1' }, contents: [...] },
  { metadata: { id: 'req-2' }, contents: [...] }
]

const responses = await ai.batch.create({ requests })

// 通过索引映射（虽不理想但可行）
responses.responses.forEach((response, i) => {
  const requestMetadata = requests[i].metadata
  console.log(`请求${requestMetadata.id}的响应：`, response)
})

社区验证： 维护者已确认此问题，内部已提交bug工单。

来源： GitHub Issue #1191

Setup Instructions

搭建步骤

Step 1: Initialize Client

步骤1：初始化客户端

typescript

import { GoogleGenAI } from '@google/genai'
import fs from 'fs'

// Initialize client with API key
const ai = new GoogleGenAI({
  apiKey: process.env.GOOGLE_API_KEY
})

// Verify API key is set
if (!process.env.GOOGLE_API_KEY) {
  throw new Error('GOOGLE_API_KEY environment variable is required')
}

typescript

import { GoogleGenAI } from '@google/genai'
import fs from 'fs'

// 使用API密钥初始化客户端
const ai = new GoogleGenAI({
  apiKey: process.env.GOOGLE_API_KEY
})

// 验证API密钥已设置
if (!process.env.GOOGLE_API_KEY) {
  throw new Error('需要设置GOOGLE_API_KEY环境变量')
}

Step 2: Create File Search Store

步骤2：创建文件搜索存储库

typescript

// Create a store (container for documents)
const fileStore = await ai.fileSearchStores.create({
  config: {
    displayName: 'my-knowledge-base',  // Human-readable name
    // Optional: Add store-level metadata
    customMetadata: {
      project: 'customer-support',
      environment: 'production'
    }
  }
})

console.log('Created store:', fileStore.name)
// Output: fileSearchStores/abc123xyz...

Finding Existing Stores:

typescript

// List all stores (paginated)
const stores = await ai.fileSearchStores.list({
  pageSize: 20  // Max 20 per page
})

// Find by display name
let targetStore = null
let pageToken = null

do {
  const page = await ai.fileSearchStores.list({ pageToken })
  targetStore = page.fileSearchStores.find(
    s => s.displayName === 'my-knowledge-base'
  )
  pageToken = page.nextPageToken
} while (!targetStore && pageToken)

if (targetStore) {
  console.log('Found existing store:', targetStore.name)
} else {
  console.log('Store not found, creating new one...')
}

typescript

// 创建存储库（文档的容器）
const fileStore = await ai.fileSearchStores.create({
  config: {
    displayName: 'my-knowledge-base',  // 人类可读的名称
    // 可选：添加存储库级别的元数据
    customMetadata: {
      project: 'customer-support',
      environment: 'production'
    }
  }
})

console.log('已创建存储库：', fileStore.name)
// 输出：fileSearchStores/abc123xyz...

查找已存在的存储库：

typescript

// 列出所有存储库（分页）
const stores = await ai.fileSearchStores.list({
  pageSize: 20  // 每页最多20个
})

// 通过显示名称查找
let targetStore = null
let pageToken = null

do {
  const page = await ai.fileSearchStores.list({ pageToken })
  targetStore = page.fileSearchStores.find(
    s => s.displayName === 'my-knowledge-base'
  )
  pageToken = page.nextPageToken
} while (!targetStore && pageToken)

if (targetStore) {
  console.log('找到已存在的存储库：', targetStore.name)
} else {
  console.log('未找到存储库，正在创建新的...')
}

Step 3: Upload Documents

步骤3：上传文档

Single File Upload:

typescript

const operation = await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('./docs/manual.pdf'),
  config: {
    displayName: 'Installation Manual',
    customMetadata: {
      doc_type: 'manual',
      version: '1.0',
      language: 'en'
    },
    chunkingConfig: {
      whiteSpaceConfig: {
        maxTokensPerChunk: 500,
        maxOverlapTokens: 50
      }
    }
  }
})

// Poll until indexing complete
while (!operation.done) {
  await new Promise(resolve => setTimeout(resolve, 1000))
  operation = await ai.operations.get({ name: operation.name })
}

console.log('✅ Indexed:', operation.response.displayName)

Batch Upload (Concurrent):

typescript

const filePaths = [
  './docs/manual.pdf',
  './docs/faq.md',
  './docs/troubleshooting.docx'
]

// Upload all files concurrently
const uploadPromises = filePaths.map(filePath =>
  ai.fileSearchStores.uploadToFileSearchStore({
    name: fileStore.name,
    file: fs.createReadStream(filePath),
    config: {
      displayName: filePath.split('/').pop(),
      customMetadata: {
        doc_type: 'support',
        source_path: filePath
      },
      chunkingConfig: {
        whiteSpaceConfig: {
          maxTokensPerChunk: 500,
          maxOverlapTokens: 50
        }
      }
    }
  })
)

const operations = await Promise.all(uploadPromises)

// Poll all operations
for (const operation of operations) {
  let op = operation
  while (!op.done) {
    await new Promise(resolve => setTimeout(resolve, 1000))
    op = await ai.operations.get({ name: op.name })
  }
  console.log('✅ Indexed:', op.response.displayName)
}

单文件上传：

typescript

const operation = await ai.fileSearchStores.uploadToFileSearchStore({
  name: fileStore.name,
  file: fs.createReadStream('./docs/manual.pdf'),
  config: {
    displayName: 'Installation Manual',
    customMetadata: {
      doc_type: 'manual',
      version: '1.0',
      language: 'en'
    },
    chunkingConfig: {
      whiteSpaceConfig: {
        maxTokensPerChunk: 500,
        maxOverlapTokens: 50
      }
    }
  }
})

// 轮询直到索引完成
while (!operation.done) {
  await new Promise(resolve => setTimeout(resolve, 1000))
  operation = await ai.operations.get({ name: operation.name })
}

console.log('✅ 已索引：', operation.response.displayName)

批量上传（并发）：

typescript

const filePaths = [
  './docs/manual.pdf',
  './docs/faq.md',
  './docs/troubleshooting.docx'
]

// 并发上传所有文件
const uploadPromises = filePaths.map(filePath =>
  ai.fileSearchStores.uploadToFileSearchStore({
    name: fileStore.name,
    file: fs.createReadStream(filePath),
    config: {
      displayName: filePath.split('/').pop(),
      customMetadata: {
        doc_type: 'support',
        source_path: filePath
      },
      chunkingConfig: {
        whiteSpaceConfig: {
          maxTokensPerChunk: 500,
          maxOverlapTokens: 50
        }
      }
    }
  })
)

const operations = await Promise.all(uploadPromises)

// 轮询所有操作状态
for (const operation of operations) {
  let op = operation
  while (!op.done) {
    await new Promise(resolve => setTimeout(resolve, 1000))
    op = await ai.operations.get({ name: op.name })
  }
  console.log('✅ 已索引：', op.response.displayName)
}

Step 4: Query with File Search

步骤4：使用File Search进行查询

Basic Query:

typescript

const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'What are the safety precautions for installation?',
  config: {
    tools: [{
      fileSearch: {
        fileSearchStoreNames: [fileStore.name]
      }
    }]
  }
})

console.log('Answer:', response.text)

// Access citations
const grounding = response.candidates[0].groundingMetadata
if (grounding?.groundingChunks) {
  console.log('\nSources:')
  grounding.groundingChunks.forEach((chunk, i) => {
    console.log(`${i + 1}. ${chunk.retrievedContext?.title || 'Unknown'}`)
    console.log(`   URI: ${chunk.retrievedContext?.uri || 'N/A'}`)
  })
}

Query with Metadata Filtering:

typescript

const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'How do I reset the device?',
  config: {
    tools: [{
      fileSearch: {
        fileSearchStoreNames: [fileStore.name],
        // Filter to only search troubleshooting docs in English, version 1.0
        metadataFilter: 'doc_type="troubleshooting" AND language="en" AND version="1.0"'
      }
    }]
  }
})

console.log('Answer:', response.text)

Metadata Filter Syntax:

AND:
```
key1="value1" AND key2="value2"
```
OR:
```
key1="value1" OR key1="value2"
```
Parentheses:
```
(key1="a" OR key1="b") AND key2="c"
```

基础查询：

typescript

const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'What are the safety precautions for installation?',
  config: {
    tools: [{
      fileSearch: {
        fileSearchStoreNames: [fileStore.name]
      }
    }]
  }
})

console.log('答案：', response.text)

// 访问引用信息
const grounding = response.candidates[0].groundingMetadata
if (grounding?.groundingChunks) {
  console.log('\n来源：')
  grounding.groundingChunks.forEach((chunk, i) => {
    console.log(`${i + 1}. ${chunk.retrievedContext?.title || 'Unknown'}`)
    console.log(`   URI: ${chunk.retrievedContext?.uri || 'N/A'}`)
  })
}

带元数据过滤的查询：

typescript

const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'How do I reset the device?',
  config: {
    tools: [{
      fileSearch: {
        fileSearchStoreNames: [fileStore.name],
        // 仅搜索英文版本1.0的故障排除文档
        metadataFilter: 'doc_type="troubleshooting" AND language="en" AND version="1.0"'
      }
    }]
  }
})

console.log('答案：', response.text)

元数据过滤语法：

AND：
```
key1="value1" AND key2="value2"
```
OR：
```
key1="value1" OR key1="value2"
```
括号：
```
(key1="a" OR key1="b") AND key2="c"
```

Step 5: List and Manage Documents

步骤5：列出和管理文档

typescript

// List all documents in store
const docs = await ai.fileSearchStores.documents.list({
  parent: fileStore.name,
  pageSize: 20
})

console.log(`Total documents: ${docs.documents?.length || 0}`)

docs.documents?.forEach(doc => {
  console.log(`- ${doc.displayName} (${doc.name})`)
  console.log(`  Metadata:`, doc.customMetadata)
})

// Get specific document details
const docDetails = await ai.fileSearchStores.documents.get({
  name: docs.documents[0].name
})

console.log('Document details:', docDetails)

// Delete document
await ai.fileSearchStores.documents.delete({
  name: docs.documents[0].name,
  force: true
})

typescript

// 列出存储库中的所有文档
const docs = await ai.fileSearchStores.documents.list({
  parent: fileStore.name,
  pageSize: 20
})

console.log(`总文档数：${docs.documents?.length || 0}`)

docs.documents?.forEach(doc => {
  console.log(`- ${doc.displayName} (${doc.name})`)
  console.log(`  元数据：`, doc.customMetadata)
})

// 获取特定文档的详细信息
const docDetails = await ai.fileSearchStores.documents.get({
  name: docs.documents[0].name
})

console.log('文档详细信息：', docDetails)

// 删除文档
await ai.fileSearchStores.documents.delete({
  name: docs.documents[0].name,
  force: true
})

Step 6: Cleanup

步骤6：清理资源

typescript

// Delete entire store (force deletes all documents)
await ai.fileSearchStores.delete({
  name: fileStore.name,
  force: true
})

console.log('✅ Store deleted')

typescript

// 删除整个存储库（强制删除所有文档）
await ai.fileSearchStores.delete({
  name: fileStore.name,
  force: true
})

console.log('✅ 存储库已删除')

Recommended Chunking Strategies

Metadata Best Practices

元数据最佳实践

Design metadata schema for filtering and organization:

设计元数据结构以实现过滤和组织：

Example: Customer Support Knowledge Base

示例：客户支持知识库

typescript

customMetadata: {
  doc_type: 'faq' | 'manual' | 'troubleshooting' | 'guide',
  product: 'widget-pro' | 'widget-lite',
  version: '1.0' | '2.0',
  language: 'en' | 'es' | 'fr',
  category: 'installation' | 'configuration' | 'maintenance',
  priority: 'critical' | 'normal' | 'low',
  last_updated: '2025-01-15',
  author: 'support-team'
}

Query Example:

typescript

metadataFilter: 'product="widget-pro" AND (doc_type="troubleshooting" OR doc_type="faq") AND language="en"'

typescript

customMetadata: {
  doc_type: 'faq' | 'manual' | 'troubleshooting' | 'guide',
  product: 'widget-pro' | 'widget-lite',
  version: '1.0' | '2.0',
  language: 'en' | 'es' | 'fr',
  category: 'installation' | 'configuration' | 'maintenance',
  priority: 'critical' | 'normal' | 'low',
  last_updated: '2025-01-15',
  author: 'support-team'
}

查询示例：

typescript

metadataFilter: 'product="widget-pro" AND (doc_type="troubleshooting" OR doc_type="faq") AND language="en"'

Example: Legal Document Repository

示例：法律文档库

typescript

customMetadata: {
  doc_type: 'contract' | 'regulation' | 'case-law' | 'policy',
  jurisdiction: 'US' | 'EU' | 'UK',
  practice_area: 'employment' | 'corporate' | 'ip' | 'tax',
  effective_date: '2025-01-01',
  status: 'active' | 'archived',
  confidentiality: 'public' | 'internal' | 'privileged'
}

typescript

customMetadata: {
  doc_type: 'contract' | 'regulation' | 'case-law' | 'policy',
  jurisdiction: 'US' | 'EU' | 'UK',
  practice_area: 'employment' | 'corporate' | 'ip' | 'tax',
  effective_date: '2025-01-01',
  status: 'active' | 'archived',
  confidentiality: 'public' | 'internal' | 'privileged'
}

Example: Code Documentation

示例：代码文档库

typescript

customMetadata: {
  doc_type: 'api-reference' | 'tutorial' | 'example' | 'changelog',
  language: 'javascript' | 'python' | 'java' | 'go',
  framework: 'react' | 'nextjs' | 'express' | 'fastapi',
  version: '1.2.0',
  difficulty: 'beginner' | 'intermediate' | 'advanced'
}

Tips:

Use consistent key naming (
```
snake_case
```
or
```
camelCase
```
)
Limit to most important filterable fields (20 max)
Use enums/constants for values (easier filtering)
Include version and date fields for time-based filtering

typescript

customMetadata: {
  doc_type: 'api-reference' | 'tutorial' | 'example' | 'changelog',
  language: 'javascript' | 'python' | 'java' | 'go',
  framework: 'react' | 'nextjs' | 'express' | 'fastapi',
  version: '1.2.0',
  difficulty: 'beginner' | 'intermediate' | 'advanced'
}

提示：

使用一致的键命名规则（
```
snake_case
```
或
```
camelCase
```
）
限制为最重要的可过滤字段（最多20个）
对值使用枚举/常量（便于过滤）
包含版本和日期字段以支持基于时间的过滤

Cost Optimization

成本优化

1. Deduplicate Before Upload

1. 上传前去重

typescript

// Track uploaded file hashes to avoid duplicates
const uploadedHashes = new Set<string>()

async function uploadWithDeduplication(filePath: string) {
  const fileHash = await getFileHash(filePath)

  if (uploadedHashes.has(fileHash)) {
    console.log(`Skipping duplicate: ${filePath}`)
    return
  }

  await ai.fileSearchStores.uploadToFileSearchStore({
    name: fileStore.name,
    file: fs.createReadStream(filePath)
  })

  uploadedHashes.add(fileHash)
}

typescript

// 跟踪已上传文件的哈希值以避免重复
const uploadedHashes = new Set<string>()

async function uploadWithDeduplication(filePath: string) {
  const fileHash = await getFileHash(filePath)

  if (uploadedHashes.has(fileHash)) {
    console.log(`跳过重复文件：${filePath}`)
    return
  }

  await ai.fileSearchStores.uploadToFileSearchStore({
    name: fileStore.name,
    file: fs.createReadStream(filePath)
  })

  uploadedHashes.add(fileHash)
}

2. Compress Large Files

2. 压缩大文件

typescript

// Convert images to text before indexing (OCR)
// Compress PDFs (remove images, use text-only)
// Use markdown instead of Word docs (smaller size)

typescript

// 索引前将图片转换为文本（OCR）
// 压缩PDF（移除图片，仅保留文本）
// 使用markdown格式替代Word文档（更小的文件体积）

3. Use Metadata Filtering to Reduce Query Scope

3. 使用元数据过滤缩小查询范围

typescript

// ❌ EXPENSIVE: Search all 10GB of documents
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Reset procedure?',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [fileStore.name] } }]
  }
})

// ✅ CHEAPER: Filter to only troubleshooting docs (subset)
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Reset procedure?',
  config: {
    tools: [{
      fileSearch: {
        fileSearchStoreNames: [fileStore.name],
        metadataFilter: 'doc_type="troubleshooting"'  // Reduces search scope
      }
    }]
  }
})

typescript

// ❌ 成本高昂：搜索所有10GB的文档
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Reset procedure?',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [fileStore.name] } }]
  }
})

// ✅ 成本更低：仅搜索故障排除文档（子集）
const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Reset procedure?',
  config: {
    tools: [{
      fileSearch: {
        fileSearchStoreNames: [fileStore.name],
        metadataFilter: 'doc_type="troubleshooting"'  // 缩小搜索范围
      }
    }]
  }
})

4. Choose Flash Over Pro for Cost Savings

4. 使用Flash模型替代Pro模型以节省成本

typescript

// Gemini 3 Flash is 10x cheaper than Pro for queries
// Use Flash unless you need Pro's advanced reasoning

// Development/testing: Use Flash
model: 'gemini-3-flash'

// Production (high-stakes answers): Use Pro
model: 'gemini-3-pro'

typescript

// Gemini 3 Flash的查询成本比Pro低10倍
// 除非需要Pro的高级推理能力，否则使用Flash

// 开发/测试环境：使用Flash
model: 'gemini-3-flash'

// 生产环境（高风险场景）：使用Pro
model: 'gemini-3-pro'

5. Monitor Storage Usage

5. 监控存储使用情况

typescript

// List stores and estimate storage
const stores = await ai.fileSearchStores.list()

for (const store of stores.fileSearchStores || []) {
  const docs = await ai.fileSearchStores.documents.list({
    parent: store.name
  })

  console.log(`Store: ${store.displayName}`)
  console.log(`Documents: ${docs.documents?.length || 0}`)
  // Estimate storage (3x input size)
  console.log(`Estimated storage: ~${(docs.documents?.length || 0) * 10} MB`)
}

typescript

// 列出所有存储库并估算存储量
const stores = await ai.fileSearchStores.list()

for (const store of stores.fileSearchStores || []) {
  const docs = await ai.fileSearchStores.documents.list({
    parent: store.name
  })

  console.log(`存储库：${store.displayName}`)
  console.log(`文档数：${docs.documents?.length || 0}`)
  // 估算存储量（3倍于原始文件大小）
  console.log(`预估存储量：~${(docs.documents?.length || 0) * 10} MB`)
}

Testing & Verification

测试与验证

Verify Store Creation

验证存储库创建

typescript

const store = await ai.fileSearchStores.get({
  name: fileStore.name
})

console.assert(store.displayName === 'my-knowledge-base', 'Store name mismatch')
console.log('✅ Store created successfully')

typescript

const store = await ai.fileSearchStores.get({
  name: fileStore.name
})

console.assert(store.displayName === 'my-knowledge-base', '存储库名称不匹配')
console.log('✅ 存储库创建成功')

Verify Document Indexing

验证文档索引

typescript

const docs = await ai.fileSearchStores.documents.list({
  parent: fileStore.name
})

console.assert(docs.documents?.length > 0, 'No documents indexed')
console.log(`✅ ${docs.documents?.length} documents indexed`)

typescript

const docs = await ai.fileSearchStores.documents.list({
  parent: fileStore.name
})

console.assert(docs.documents?.length > 0, '未索引任何文档')
console.log(`✅ 已索引${docs.documents?.length}个文档`)

Verify Query Functionality

验证查询功能

typescript

const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'What is this knowledge base about?',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [fileStore.name] } }]
  }
})

console.assert(response.text.length > 0, 'Empty response')
console.log('✅ Query successful:', response.text.substring(0, 100) + '...')

typescript

const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'What is this knowledge base about?',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [fileStore.name] } }]
  }
})

console.assert(response.text.length > 0, '响应为空')
console.log('✅ 查询成功：', response.text.substring(0, 100) + '...')

Verify Citations

验证引用信息

typescript

const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Provide a specific answer with citations.',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [fileStore.name] } }]
  }
})

const grounding = response.candidates[0].groundingMetadata
console.assert(
  grounding?.groundingChunks?.length > 0,
  'No grounding/citations returned'
)
console.log(`✅ ${grounding?.groundingChunks?.length} citations returned`)

typescript

const response = await ai.models.generateContent({
  model: 'gemini-3-flash',
  contents: 'Provide a specific answer with citations.',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [fileStore.name] } }]
  }
})

const grounding = response.candidates[0].groundingMetadata
console.assert(
  grounding?.groundingChunks?.length > 0,
  '未返回grounding/引用信息'
)
console.log(`✅ 返回了${grounding?.groundingChunks?.length}条引用信息`)

Integration Examples

集成示例

Streaming Support

流式响应支持

File Search supports streaming responses with

generateContentStream()

typescript

// ✅ Streaming works with File Search (v1.34.0+)
const stream = await ai.models.generateContentStream({
  model: 'gemini-3-flash',
  contents: 'Summarize the document',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
  }
})

for await (const chunk of stream) {
  process.stdout.write(chunk.text)
}

// Access grounding after stream completes
const grounding = stream.candidates[0].groundingMetadata

Note: Early SDK versions (pre-v1.34.0) may have had streaming issues. Use v1.34.0+ for reliable streaming support.

Source: GitHub Issue #1221

File Search支持通过

generateContentStream()

获取流式响应：

typescript

// ✅ 流式响应与File Search兼容（v1.34.0+）
const stream = await ai.models.generateContentStream({
  model: 'gemini-3-flash',
  contents: 'Summarize the document',
  config: {
    tools: [{ fileSearch: { fileSearchStoreNames: [storeName] } }]
  }
})

for await (const chunk of stream) {
  process.stdout.write(chunk.text)
}

// 流式响应完成后访问grounding信息
const grounding = stream.candidates[0].groundingMetadata

注意： 早期SDK版本（v1.34.0之前）可能存在流式响应问题。使用v1.34.0+以获得可靠的流式支持。

来源： GitHub Issue #1221

Working Templates

可用模板

This skill includes 3 working templates in the

templates/

directory:

本技能在

templates/

目录中包含3个可用模板：

Template 1: basic-node-rag

模板1：basic-node-rag

Minimal Node.js/TypeScript example demonstrating:

Create file search store
Upload multiple documents
Query with natural language
Display citations

Use when: Learning File Search, prototyping, simple CLI tools

Run:

bash

cd templates/basic-node-rag
npm install
npm run dev

极简的Node.js/TypeScript示例，演示：

创建文件搜索存储库
上传多个文档
自然语言查询
显示引用信息

适用场景： 学习File Search、原型开发、简单CLI工具

运行：

bash

cd templates/basic-node-rag
npm install
npm run dev

Template 2: cloudflare-worker-rag

模板2：cloudflare-worker-rag

Cloudflare Workers integration showing:

Edge API for document upload
Edge API for semantic search
Integration with R2 for document storage
Hybrid architecture (Gemini File Search + Cloudflare edge)

Use when: Building global edge applications, integrating with Cloudflare stack

Deploy:

bash

cd templates/cloudflare-worker-rag
npm install
npx wrangler deploy

Cloudflare Workers集成示例，展示：

用于文档上传的边缘API
用于语义搜索的边缘API
与R2存储的集成
混合架构（Gemini File Search + Cloudflare边缘）

适用场景： 构建全球边缘应用、与Cloudflare技术栈集成

部署：

bash

cd templates/cloudflare-worker-rag
npm install
npx wrangler deploy

Template 3: nextjs-docs-search

模板3：nextjs-docs-search

Full-stack Next.js application featuring:

Document upload UI with drag-and-drop
Real-time search interface
Citation rendering with source links
Metadata filtering UI

Use when: Building production documentation sites, knowledge bases

Run:

bash

cd templates/nextjs-docs-search
npm install
npm run dev

全栈Next.js应用，功能包括：

支持拖拽的文档上传UI
实时搜索界面
带源链接的引用渲染
元数据过滤UI

适用场景： 构建生产级文档站点、知识库

运行：

bash

cd templates/nextjs-docs-search
npm install
npm run dev

References

参考资料

Official Documentation:

File Search Overview: https://ai.google.dev/gemini-api/docs/file-search
API Reference (Stores): https://ai.google.dev/api/file-search/file-search-stores
API Reference (Documents): https://ai.google.dev/api/file-search/documents
Blog Announcement: https://blog.google/technology/developers/file-search-gemini-api/
Pricing: https://ai.google.dev/pricing

Tutorials:

JavaScript/TypeScript Guide: https://www.philschmid.de/gemini-file-search-javascript
SDK Repository: https://github.com/googleapis/js-genai

Bundled Resources in This Skill:

```
references/api-reference.md
```
- Complete API documentation
```
references/chunking-best-practices.md
```
- Detailed chunking strategies
```
references/pricing-calculator.md
```
- Cost estimation guide
```
references/migration-from-openai.md
```
- Migration guide from OpenAI Files API
```
scripts/create-store.ts
```
- CLI tool to create stores
```
scripts/upload-batch.ts
```
- Batch upload script
```
scripts/query-store.ts
```
- Interactive query tool
```
scripts/cleanup.ts
```
- Cleanup script

Working Templates:

```
templates/basic-node-rag/
```
- Minimal Node.js example
```
templates/cloudflare-worker-rag/
```
- Edge deployment example
```
templates/nextjs-docs-search/
```
- Full-stack Next.js app

Skill Version: 1.1.0 Last Verified: 2026-01-21 Package Version: @google/genai ^1.38.0 (minimum 1.29.0 required) Token Savings: ~67% Errors Prevented: 12 Changes: Added 4 new errors from community research (displayName Blob issue, grounding with JSON mode, tool conflicts, batch API metadata), enhanced polling timeout pattern with fallback verification, added streaming support note

官方文档：

File Search概述：https://ai.google.dev/gemini-api/docs/file-search
API参考（存储库）：https://ai.google.dev/api/file-search/file-search-stores
API参考（文档）：https://ai.google.dev/api/file-search/documents
博客公告：https://blog.google/technology/developers/file-search-gemini-api/
定价：https://ai.google.dev/pricing

教程：

JavaScript/TypeScript指南：https://www.philschmid.de/gemini-file-search-javascript
SDK仓库：https://github.com/googleapis/js-genai

本技能包含的资源：

```
references/api-reference.md
```
- 完整的API文档
```
references/chunking-best-practices.md
```
- 详细的分块策略
```
references/pricing-calculator.md
```
- 成本估算指南
```
references/migration-from-openai.md
```
- 从OpenAI Files API迁移的指南
```
scripts/create-store.ts
```
- 创建存储库的CLI工具
```
scripts/upload-batch.ts
```
- 批量上传脚本
```
scripts/query-store.ts
```
- 交互式查询工具
```
scripts/cleanup.ts
```
- 资源清理脚本

可用模板：

```
templates/basic-node-rag/
```
- 极简Node.js示例
```
templates/cloudflare-worker-rag/
```
- 边缘部署示例
```
templates/nextjs-docs-search/
```
- 全栈Next.js应用

技能版本： 1.1.0 最后验证日期： 2026-01-21 依赖包版本： @google/genai ^1.38.0（最低要求1.29.0） Token节省率： ~67% 可预防错误数： 12 更新内容： 新增4个社区反馈的错误（Blob上传displayName问题、JSON模式grounding丢失、工具冲突、批量API元数据问题），增强带回退验证的轮询超时模式，添加流式响应支持说明

google-gemini-file-search

Original

Translation

Google Gemini File Search Setup

Google Gemini File Search 搭建指南

Overview

概述

Prerequisites

前置条件

1. Google AI API Key

1. Google AI API密钥

2. Node.js Environment

2. Node.js环境

3. Install @google/genai SDK

3. 安装@google/genai SDK

or

或

or

或

4. TypeScript Configuration (Optional but Recommended)

4. TypeScript配置（可选但推荐）

Common Errors Prevented

可预防的常见错误

Error 1: Document Immutability

错误1：文档不可修改

Error 2: Storage Quota Exceeded

错误2：存储配额超限

Error 3: Incorrect Chunking Configuration

错误3：分块配置不正确

Error 4: Metadata Limits Exceeded

错误4：元数据超出限制

Error 5: Indexing Cost Surprises

错误5：索引成本超出预期

Error 6: Not Polling Operation Status

错误6：未轮询操作状态

Error 7: Forgetting Force Delete

错误7：忘记使用强制删除

Error 8: Using Unsupported Models

错误8：使用不支持的模型

Error 9: displayName Not Preserved for Blob Sources (Fixed v1.34.0+)

错误9：Blob源文件的displayName未被保留（v1.34.0+已修复）

Error 10: Grounding Metadata Ignored with JSON Response Mode

错误10：JSON响应模式下Grounding元数据被忽略

Error 11: Google Search and File Search Tools Are Mutually Exclusive

错误11：Google Search与File Search工具互斥

Error 12: Batch API Missing Response Metadata (Community-sourced)

错误12：批量API缺少响应元数据（社区反馈）

Setup Instructions

搭建步骤

Step 1: Initialize Client

步骤1：初始化客户端

Step 2: Create File Search Store

步骤2：创建文件搜索存储库

Step 3: Upload Documents

步骤3：上传文档

Step 4: Query with File Search

步骤4：使用File Search进行查询

Step 5: List and Manage Documents

步骤5：列出和管理文档

Step 6: Cleanup

步骤6：清理资源

Recommended Chunking Strategies

推荐的分块策略

Technical Documentation

技术文档

Prose and Articles

散文与文章

Legal and Contracts

法律与合同文件

FAQ and Support

FAQ与支持文档

Metadata Best Practices

元数据最佳实践

Example: Customer Support Knowledge Base

示例：客户支持知识库

Example: Legal Document Repository

示例：法律文档库

Example: Code Documentation

示例：代码文档库

Cost Optimization