gemini-api-2026
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGemini API Development Skill
Gemini API开发指南
Source: Official Gemini API documentation scraped 2026-02-27
Coverage: All 81 documentation files
Coverage: All 81 documentation files
来源: 2026年2月27日爬取的Gemini API官方文档
覆盖范围: 全部81份文档文件
覆盖范围: 全部81份文档文件
Quick Start
快速入门
python
from google import genai # CRITICAL: NOT google.generativeai
client = genai.Client() # Uses GEMINI_API_KEY env var automatically
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="Hello world"
)
print(response.text)JavaScript:
javascript
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({}); // Uses GEMINI_API_KEY env var
const response = await ai.models.generateContent({
model: "gemini-3-flash-preview",
contents: "Hello world"
});
console.log(response.text);python
from google import genai # 重要:不要使用google.generativeai
client = genai.Client() # 自动读取GEMINI_API_KEY环境变量
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="Hello world"
)
print(response.text)JavaScript示例:
javascript
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({}); // 自动读取GEMINI_API_KEY环境变量
const response = await ai.models.generateContent({
model: "gemini-3-flash-preview",
contents: "Hello world"
});
console.log(response.text);CRITICAL GOTCHAS (Read First)
重要注意事项(请先阅读)
- SDK import: — NOT
from google import genai(legacy)google.generativeai - Temperature: Default is 1.0 for Gemini 3 — do NOT lower it; causes loops/degraded performance
- Thinking params: Gemini 3 uses ("low"/"medium"/"high"); Gemini 2.5 uses
thinking_level(integer tokens)thinking_budget - Thought signatures: Gemini 3 REQUIRES thought signatures echoed back during function calling or you get a 400 error. SDKs handle this automatically in chat mode.
- API default: SDK defaults to . Use
v1betafor experimental features.http_options={'api_version': 'v1alpha'} - REST auth header: (not Authorization Bearer)
x-goog-api-key: $GEMINI_API_KEY
- SDK导入:使用— 不要用旧版的
from google import genaigoogle.generativeai - Temperature参数:Gemini 3的默认值为1.0 — 不要调低该值,否则会导致循环/性能下降
- 推理参数:Gemini 3使用(可选值"low"/"medium"/"high");Gemini 2.5使用
thinking_level(整数类型的token数)thinking_budget - 思想签名:Gemini 3在函数调用时要求回传思想签名,否则会返回400错误。SDK在聊天模式下会自动处理该逻辑
- API版本默认值:SDK默认使用版本,如需使用实验特性请设置
v1betahttp_options={'api_version': 'v1alpha'} - REST鉴权头:使用(而非Authorization Bearer格式)
x-goog-api-key: $GEMINI_API_KEY
Models Reference
模型参考
Gemini 3 Series (Current)
Gemini 3系列(当前最新)
| Model | String | Notes |
|---|---|---|
| Gemini 3.1 Pro Preview | | Latest; also |
| Gemini 3 Flash Preview | | Default workhorse; shutdown: no date |
| Gemini 3 Pro Image Preview | | "Nano Banana Pro" — native image gen |
| Gemini 3.1 Flash Image Preview | | "Nano Banana 2" — fast image gen |
⚠️ Gemini 3 Pro Preview () shuts down March 9, 2026 → migrate to
gemini-3-pro-previewgemini-3.1-pro-preview| 模型 | 标识字符串 | 说明 |
|---|---|---|
| Gemini 3.1 Pro Preview | | 最新版本;另有 |
| Gemini 3 Flash Preview | | 默认通用模型;暂无下线日期 |
| Gemini 3 Pro Image Preview | | 又称"Nano Banana Pro" — 原生图像生成能力 |
| Gemini 3.1 Flash Image Preview | | 又称"Nano Banana 2" — 快速图像生成能力 |
⚠️ Gemini 3 Pro Preview () 将于2026年3月9日下线 → 请迁移至
gemini-3-pro-previewgemini-3.1-pro-previewGemini 2.5 Series (Stable)
Gemini 2.5系列(稳定版)
| Model | String | Shutdown |
|---|---|---|
| Gemini 2.5 Pro | | June 17, 2026 |
| Gemini 2.5 Flash | | June 17, 2026 |
| Gemini 2.5 Flash Lite | | July 22, 2026 |
| Gemini 2.5 Flash Image | | Oct 2, 2026 |
| 模型 | 标识字符串 | 下线日期 |
|---|---|---|
| Gemini 2.5 Pro | | 2026年6月17日 |
| Gemini 2.5 Flash | | 2026年6月17日 |
| Gemini 2.5 Flash Lite | | 2026年7月22日 |
| Gemini 2.5 Flash Image | | 2026年10月2日 |
Gemini 2.0 Series (Deprecating)
Gemini 2.0系列(即将弃用)
- ,
gemini-2.0-flash→ shutdown June 1, 2026gemini-2.0-flash-lite
- 、
gemini-2.0-flash→ 2026年6月1日下线gemini-2.0-flash-lite
Specialized Models
专项模型
- TTS:
gemini-2.5-flash-preview-tts - Live API:
gemini-2.5-flash-native-audio-preview-12-2025 - Computer Use: ,
gemini-2.5-computer-use-preview-10-2025gemini-3-flash-preview - Deep Research: (via Interactions API only)
deep-research-pro-preview-12-2025 - Embeddings:
gemini-embedding-001 - Video (Veo):
veo-3.1-generate-preview - Images (Imagen):
imagen-4.0-generate-001 - Music (Lyria):
models/lyria-realtime-exp - Robotics:
gemini-robotics-er-1.5-preview - LearnLM: experimental tutor model
- TTS:
gemini-2.5-flash-preview-tts - Live API:
gemini-2.5-flash-native-audio-preview-12-2025 - Computer Use:、
gemini-2.5-computer-use-preview-10-2025gemini-3-flash-preview - 深度研究:(仅可通过Interactions API调用)
deep-research-pro-preview-12-2025 - 嵌入模型:
gemini-embedding-001 - 视频生成(Veo):
veo-3.1-generate-preview - 图像生成(Imagen):
imagen-4.0-generate-001 - 音乐生成(Lyria):
models/lyria-realtime-exp - 机器人:
gemini-robotics-er-1.5-preview - LearnLM:实验性教学模型
Latest Aliases
最新版本别名
- →
gemini-pro-latestgemini-3-pro-preview - →
gemini-flash-latestgemini-3-flash-preview
- →
gemini-pro-latestgemini-3-pro-preview - →
gemini-flash-latestgemini-3-flash-preview
Libraries & Installation
依赖库与安装
bash
pip install google-genai # Python
npm install @google/genai # JavaScript
go get google.golang.org/genai # Gobash
pip install google-genai # Python
npm install @google/genai # JavaScript
go get google.golang.org/genai # GoJava: com.google.genai:google-genai:1.0.0
Java: com.google.genai:google-genai:1.0.0
C#: dotnet add package Google.GenAI
C#: dotnet add package Google.GenAI
**OpenAI compatibility** (3 line change):
```python
from openai import OpenAI
client = OpenAI(
api_key="GEMINI_API_KEY",
base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)
response = client.chat.completions.create(model="gemini-3-flash-preview", messages=[...])
**OpenAI兼容模式**(仅需修改3行代码):
```python
from openai import OpenAI
client = OpenAI(
api_key="GEMINI_API_KEY",
base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)
response = client.chat.completions.create(model="gemini-3-flash-preview", messages=[...])Core Generation
核心生成能力
System Instructions
系统提示词
python
from google.genai import types
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="User message",
config=types.GenerateContentConfig(
system_instruction="You are a helpful assistant.",
temperature=1.0,
max_output_tokens=1024,
)
)python
from google.genai import types
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="用户消息",
config=types.GenerateContentConfig(
system_instruction="你是一个乐于助人的助手。",
temperature=1.0,
max_output_tokens=1024,
)
)Multi-turn Chat
多轮对话
python
chat = client.chats.create(model="gemini-3-flash-preview")
response = chat.send_message("Hello")
print(response.text)
response2 = chat.send_message("Tell me more")
print(response2.text)python
chat = client.chats.create(model="gemini-3-flash-preview")
response = chat.send_message("你好")
print(response.text)
response2 = chat.send_message("详细介绍一下")
print(response2.text)Streaming
流式输出
python
for chunk in client.models.generate_content_stream(
model="gemini-3-flash-preview",
contents="Write a long story"
):
print(chunk.text, end="")python
for chunk in client.models.generate_content_stream(
model="gemini-3-flash-preview",
contents="写一个长故事"
):
print(chunk.text, end="")Token Counting
Token计数
python
undefinedpython
undefinedBefore sending:
发送请求前计数:
count = client.models.count_tokens(model="gemini-3-flash-preview", contents=prompt)
print(count.total_tokens)
count = client.models.count_tokens(model="gemini-3-flash-preview", contents=prompt)
print(count.total_tokens)
After generating:
生成结果后计数:
print(response.usage_metadata)
print(response.usage_metadata)
Fields: prompt_token_count, candidates_token_count, thoughts_token_count, total_token_count
包含字段:prompt_token_count, candidates_token_count, thoughts_token_count, total_token_count
1 token ≈ 4 characters; 100 tokens ≈ 60-80 English words.
---
1 token ≈ 4个字符;100 tokens ≈ 60-80个英文单词。
---Thinking (Reasoning)
推理(思维链)能力
Gemini 3 — thinking_level
thinking_levelGemini 3 — thinking_level
thinking_levelpython
config=types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(thinking_level="low") # "low", "medium", "high"
)python
config=types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(thinking_level="low") # 可选值"low", "medium", "high"
)Gemini 2.5 — thinking_budget
thinking_budgetGemini 2.5 — thinking_budget
thinking_budgetpython
config=types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(thinking_budget=1024) # token budget; 0=disabled
)Thinking is enabled by default on 2.5 and 3 models — causes higher latency/tokens. Disable if optimizing for speed.
python
config=types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(thinking_budget=1024) # token预算;0表示禁用推理
)2.5和3系列模型默认开启推理能力 — 会导致更高的延迟和token消耗,如追求速度可禁用该功能。
Multimodal Input
多模态输入
Images (Inline — under 20MB)
图像(内联上传 — 大小不超过20MB)
python
with open('image.jpg', 'rb') as f:
image_bytes = f.read()
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents=[
types.Part.from_bytes(data=image_bytes, mime_type='image/jpeg'),
"Caption this image."
]
)python
with open('image.jpg', 'rb') as f:
image_bytes = f.read()
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents=[
types.Part.from_bytes(data=image_bytes, mime_type='image/jpeg'),
"给这张图写说明文字。"
]
)Images (URL fetch)
图像(URL拉取)
python
import requests
image_bytes = requests.get("https://example.com/image.jpg").content
image = types.Part.from_bytes(data=image_bytes, mime_type="image/jpeg")python
import requests
image_bytes = requests.get("https://example.com/image.jpg").content
image = types.Part.from_bytes(data=image_bytes, mime_type="image/jpeg")PDF Documents (Inline — under 50MB)
PDF文档(内联上传 — 大小不超过50MB)
python
import pathlib
filepath = pathlib.Path('file.pdf')
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents=[
types.Part.from_bytes(data=filepath.read_bytes(), mime_type='application/pdf'),
"Summarize this document"
]
)python
import pathlib
filepath = pathlib.Path('file.pdf')
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents=[
types.Part.from_bytes(data=filepath.read_bytes(), mime_type='application/pdf'),
"总结这份文档的内容"
]
)Audio (via Files API)
音频(通过Files API上传)
python
myfile = client.files.upload(file="sample.mp3")
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents=["Describe this audio", myfile]
)Audio capabilities: transcription, translation, speaker diarization, emotion detection, timestamps.
For real-time audio → use Live API.
For real-time audio → use Live API.
python
myfile = client.files.upload(file="sample.mp3")
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents=["描述这段音频的内容", myfile]
)音频能力支持:转录、翻译、说话人分离、情绪检测、时间戳标记。
实时音频场景请使用Live API。
实时音频场景请使用Live API。
Video (via Files API)
视频(通过Files API上传)
python
myfile = client.files.upload(file="video.mp4")
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents=[myfile, "Summarize this video"]
)python
myfile = client.files.upload(file="video.mp4")
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents=[myfile, "总结这段视频的内容"]
)YouTube URLs
YouTube链接
python
undefinedpython
undefinedInclude YouTube URL directly in contents
直接在内容中包含YouTube链接即可
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents=["https://www.youtube.com/watch?v=XXXXX", "Summarize this video"]
)
---response = client.models.generate_content(
model="gemini-3-flash-preview",
contents=["https://www.youtube.com/watch?v=XXXXX", "总结这段视频的内容"]
)
---File Input Methods Comparison
文件输入方式对比
| Method | Max Size | Best For | Persistence |
|---|---|---|---|
| Inline data | 100MB (50MB PDF) | Small files, one-off | None |
| Files API | 2GB/file, 20GB/project | Large files, reuse | 48 hours |
| GCS URI | 2GB/file, unlimited storage | GCS files | 30 days (registration) |
| External URLs | 100MB | Public URLs, AWS/Azure/GCS | None (fetched per request) |
| 方式 | 最大大小 | 适用场景 | 持久化时长 |
|---|---|---|---|
| 内联数据 | 100MB(PDF为50MB) | 小文件、一次性使用 | 无 |
| Files API | 单文件2GB,单项目20GB | 大文件、重复使用 | 48小时 |
| GCS URI | 单文件2GB,存储无上限 | GCS存储的文件 | 30天(注册后) |
| 外部URL | 100MB | 公共URL、AWS/Azure/GCS存储的文件 | 无(每次请求都会重新拉取) |
Files API
Files API
python
undefinedpython
undefinedUpload
上传文件
myfile = client.files.upload(file="path/to/file.pdf")
print(myfile.uri) # use in requests
myfile = client.files.upload(file="path/to/file.pdf")
print(myfile.uri) # 可在请求中直接使用
List
列出文件
for file in client.files.list():
print(file.name)
for file in client.files.list():
print(file.name)
Delete
删除文件
client.files.delete(name=myfile.name)
**Files API limits:** 2GB per file, 20GB per project, 48-hour TTL.
---client.files.delete(name=myfile.name)
**Files API限制:** 单文件最大2GB,单项目总容量20GB,文件TTL为48小时。
---Structured Output
结构化输出
python
from pydantic import BaseModel
class Recipe(BaseModel):
name: str
ingredients: list[str]
steps: list[str]
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="Give me a chocolate cake recipe",
config=types.GenerateContentConfig(
response_mime_type="application/json",
response_schema=Recipe,
)
)
import json
recipe = json.loads(response.text)python
from pydantic import BaseModel
class Recipe(BaseModel):
name: str
ingredients: list[str]
steps: list[str]
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="给我一个巧克力蛋糕的食谱",
config=types.GenerateContentConfig(
response_mime_type="application/json",
response_schema=Recipe,
)
)
import json
recipe = json.loads(response.text)Tools: Built-in
内置工具
Google Search (Grounding)
谷歌搜索(事实 grounding)
python
grounding_tool = types.Tool(google_search=types.GoogleSearch())
config = types.GenerateContentConfig(tools=[grounding_tool])
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="Who won Euro 2024?",
config=config
)python
grounding_tool = types.Tool(google_search=types.GoogleSearch())
config = types.GenerateContentConfig(tools=[grounding_tool])
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="2024年欧洲杯冠军是谁?",
config=config
)Check response.candidates[0].grounding_metadata for citations
可查看response.candidates[0].grounding_metadata获取引用来源
Response includes `groundingMetadata` with `webSearchQueries`, `groundingChunks`, `groundingSupports`.
响应包含`groundingMetadata`字段,包含`webSearchQueries`、`groundingChunks`、`groundingSupports`信息。URL Context
URL上下文
python
config = types.GenerateContentConfig(tools=[{"url_context": {}}])python
config = types.GenerateContentConfig(tools=[{"url_context": {}}])Include URLs in the prompt text
在提示词文本中直接包含URL即可
undefinedundefinedGoogle Maps (NOT available with Gemini 3)
谷歌地图(Gemini 3暂不支持)
python
undefinedpython
undefinedOnly for Gemini 2.5 models
仅适用于Gemini 2.5系列模型
config = types.GenerateContentConfig(
tools=[types.Tool(google_maps=types.GoogleMaps())],
tool_config=types.ToolConfig(retrieval_config=types.RetrievalConfig(
lat_lng=types.LatLng(latitude=34.05, longitude=-118.25)
))
)
undefinedconfig = types.GenerateContentConfig(
tools=[types.Tool(google_maps=types.GoogleMaps())],
tool_config=types.ToolConfig(retrieval_config=types.RetrievalConfig(
lat_lng=types.LatLng(latitude=34.05, longitude=-118.25)
))
)
undefinedCode Execution
代码执行
python
config = types.GenerateContentConfig(
tools=[types.Tool(code_execution=types.CodeExecution())]
)python
config = types.GenerateContentConfig(
tools=[types.Tool(code_execution=types.CodeExecution())]
)Function Calling
函数调用
python
def get_weather(location: str) -> dict:
return {"temp": 72, "condition": "sunny"}
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="What's the weather in NYC?",
config=types.GenerateContentConfig(
tools=[get_weather],
tool_config=types.ToolConfig(
function_calling_config=types.FunctionCallingConfig(mode="AUTO")
)
)
)python
def get_weather(location: str) -> dict:
return {"temp": 72, "condition": "sunny"}
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="纽约现在的天气怎么样?",
config=types.GenerateContentConfig(
tools=[get_weather],
tool_config=types.ToolConfig(
function_calling_config=types.FunctionCallingConfig(mode="AUTO")
)
)
)Check for function calls
检查是否有函数调用请求
for part in response.candidates[0].content.parts:
if part.function_call:
result = get_weather(**part.function_call.args)
# Send result back...
**⚠️ Gemini 3 Thought Signatures in Function Calling:**
When Gemini 3 returns function calls, each step includes a `thoughtSignature`. You MUST echo it back exactly — omitting it causes a 400 error. **The SDK handles this automatically if you use the chat API or append the full response object to history.**
Manual handling pattern:
```pythonfor part in response.candidates[0].content.parts:
if part.function_call:
result = get_weather(**part.function_call.args)
# 将结果回传给模型...
**⚠️ Gemini 3函数调用的思想签名要求:**
当Gemini 3返回函数调用时,每个步骤都包含`thoughtSignature`字段,你必须原封不动地回传该字段 — 省略会导致400错误。**如果你使用聊天API或将完整响应对象追加到对话历史中,SDK会自动处理该逻辑。**
手动处理示例:
```pythonAfter getting FC response, include the full model turn (with signatures) in next request
收到函数调用响应后,将包含签名的完整模型返回内容包含到下一次请求中
contents = [
{"role": "user", "parts": [{"text": "original request"}]},
model_response.candidates[0].content, # includes thoughtSignature
{"role": "user", "parts": [{"function_response": {"name": "fn", "response": result}}]}
]
---contents = [
{"role": "user", "parts": [{"text": "原始请求"}]},
model_response.candidates[0].content, # 包含thoughtSignature
{"role": "user", "parts": [{"function_response": {"name": "fn", "response": result}}]}
]
---Embeddings
嵌入生成
python
result = client.models.embed_content(
model="gemini-embedding-001",
contents="What is the meaning of life?"
)
print(result.embeddings)python
result = client.models.embed_content(
model="gemini-embedding-001",
contents="生命的意义是什么?"
)
print(result.embeddings)Batch
批量生成
result = client.models.embed_content(
model="gemini-embedding-001",
contents=["text 1", "text 2", "text 3"]
)
Model: `gemini-embedding-001` (GA until July 14, 2026)
Use case: semantic search, RAG, classification, clustering.
---result = client.models.embed_content(
model="gemini-embedding-001",
contents=["文本1", "文本2", "文本3"]
)
模型:`gemini-embedding-001`(正式可用至2026年7月14日)
适用场景:语义搜索、RAG、分类、聚类。
---File Search (RAG)
文件搜索(RAG)
Managed RAG — free file storage and free embedding generation at query time. Pay only for initial indexing + model tokens.
python
undefined托管RAG服务 — 提供免费的文件存储和查询时免费嵌入生成,仅需为初始索引和模型token付费。
python
undefinedCreate store
创建存储库
file_search_store = client.file_search_stores.create(
config={'display_name': 'my-store'}
)
file_search_store = client.file_search_stores.create(
config={'display_name': 'my-store'}
)
Upload directly
直接上传文件
operation = client.file_search_stores.upload_to_file_search_store(
file='document.pdf',
file_search_store_name=file_search_store.name,
config={'display_name': 'My Doc'}
)
while not operation.done:
time.sleep(5)
operation = client.operations.get(operation)
operation = client.file_search_stores.upload_to_file_search_store(
file='document.pdf',
file_search_store_name=file_search_store.name,
config={'display_name': '我的文档'}
)
while not operation.done:
time.sleep(5)
operation = client.operations.get(operation)
Query
查询
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="What does the document say about X?",
config=types.GenerateContentConfig(
tools=[types.Tool(
file_search=types.FileSearch(
file_search_store_names=[file_search_store.name]
)
)]
)
)
---response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="文档中关于X的描述是什么?",
config=types.GenerateContentConfig(
tools=[types.Tool(
file_search=types.FileSearch(
file_search_store_names=[file_search_store.name]
)
)]
)
)
---Context Caching
上下文缓存
Reduces cost by caching repeated large contexts. Paid tier only.
python
from google.genai import types通过缓存重复的大上下文降低成本。仅付费版可用。
python
from google.genai import typesCreate cache
创建缓存
cache = client.caches.create(
model="gemini-3-flash-preview",
config=types.CreateCachedContentConfig(
contents=[large_document_content],
system_instruction="You are an expert analyst.",
ttl="3600s", # 1 hour
display_name="my-cache"
)
)
cache = client.caches.create(
model="gemini-3-flash-preview",
config=types.CreateCachedContentConfig(
contents=[large_document_content],
system_instruction="你是一名专家分析师。",
ttl="3600s", # 1小时
display_name="my-cache"
)
)
Use cache
使用缓存
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="What are the key findings?",
config=types.GenerateContentConfig(cached_content=cache.name)
)
print(response.usage_metadata.cached_content_token_count)
Implicit caching: 2048+ token prefix is automatically cached at 75% discount.
Explicit caching: manual TTL control.
---response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="核心发现有哪些?",
config=types.GenerateContentConfig(cached_content=cache.name)
)
print(response.usage_metadata.cached_content_token_count)
隐式缓存:2048+token的前缀会自动缓存,享受75%折扣。
显式缓存:支持手动控制TTL。
---Batch API
批量API
50% cost reduction for non-urgent workloads. 24-hour SLO.
python
undefined非紧急工作负载可享受50%成本优惠,SLO为24小时。
python
undefinedCreate batch job (see batch-api.md for full syntax)
创建批量任务(完整语法可参考batch-api.md)
batch_job = client.batches.create(
model="gemini-3-flash-preview",
src="gs://bucket/requests.jsonl",
config=types.CreateBatchJobConfig(dest="gs://bucket/responses/")
)
batch_job = client.batches.create(
model="gemini-3-flash-preview",
src="gs://bucket/requests.jsonl",
config=types.CreateBatchJobConfig(dest="gs://bucket/responses/")
)
Poll
轮询状态
while batch_job.state not in ["JOB_STATE_SUCCEEDED", "JOB_STATE_FAILED"]:
time.sleep(30)
batch_job = client.batches.get(name=batch_job.name)
---while batch_job.state not in ["JOB_STATE_SUCCEEDED", "JOB_STATE_FAILED"]:
time.sleep(30)
batch_job = client.batches.get(name=batch_job.name)
---Interactions API
Interactions API
Used for agents (Deep Research). Not accessible via .
generate_contentpython
import time
interaction = client.interactions.create(
input="Research the history of quantum computing",
agent='deep-research-pro-preview-12-2025',
background=True # REQUIRED for long tasks
)
while True:
interaction = client.interactions.get(interaction.id)
if interaction.status == "completed":
print(interaction.outputs[-1].text)
break
elif interaction.status == "failed":
break
time.sleep(10)For combining with your own data:
python
interaction = client.interactions.create(
input="Compare our Q4 report to public benchmarks",
agent="deep-research-pro-preview-12-2025",
background=True,
tools=[{"type": "file_search", "file_search_store_names": ["fileSearchStores/my-store"]}]
)用于Agent场景(深度研究),无法通过调用。
generate_contentpython
import time
interaction = client.interactions.create(
input="研究量子计算的历史",
agent='deep-research-pro-preview-12-2025',
background=True # 长任务必填
)
while True:
interaction = client.interactions.get(interaction.id)
if interaction.status == "completed":
print(interaction.outputs[-1].text)
break
elif interaction.status == "failed":
break
time.sleep(10)结合自有数据使用的示例:
python
interaction = client.interactions.create(
input="将我们的Q4报告与公开基准对比",
agent="deep-research-pro-preview-12-2025",
background=True,
tools=[{"type": "file_search", "file_search_store_names": ["fileSearchStores/my-store"]}]
)Live API (Real-time Voice/Video)
Live API(实时音视频)
For interactive, streaming audio/video sessions via WebSocket.
python
import asyncio
from google import genai
client = genai.Client()
model = "gemini-2.5-flash-native-audio-preview-12-2025"
async def main():
async with client.aio.live.connect(
model=model,
config={"response_modalities": ["AUDIO"]}
) as session:
await session.send_client_content(
turns="Hello, how are you?",
turn_complete=True
)
async for response in session.receive():
if response.data:
# raw PCM audio bytes (24kHz, 16-bit, little-endian)
pass
if response.server_content and response.server_content.turn_complete:
break
asyncio.run(main())Audio format: raw PCM, little-endian, 16-bit. Output: 24kHz. Input: natively 16kHz (resampled if different).
Session limits:
- Audio-only: 15 min (without compression)
- Audio+video: 2 min
- Connection: ~10 min → use Session Resumption
Session Resumption:
python
config=types.LiveConnectConfig(
session_resumption=types.SessionResumptionConfig(handle=previous_handle)
)用于通过WebSocket实现交互式的流式音视频会话。
python
import asyncio
from google import genai
client = genai.Client()
model = "gemini-2.5-flash-native-audio-preview-12-2025"
async def main():
async with client.aio.live.connect(
model=model,
config={"response_modalities": ["AUDIO"]}
) as session:
await session.send_client_content(
turns="你好,最近怎么样?",
turn_complete=True
)
async for response in session.receive():
if response.data:
# 原始PCM音频字节(24kHz,16位,小端)
pass
if response.server_content and response.server_content.turn_complete:
break
asyncio.run(main())音频格式: 原始PCM,小端,16位。输出采样率24kHz,输入原生支持16kHz(其他采样率会自动重采样)。
会话限制:
- 纯音频:15分钟(无压缩)
- 音视频:2分钟
- 连接时长:约10分钟 → 可使用会话恢复功能
会话恢复:
python
config=types.LiveConnectConfig(
session_resumption=types.SessionResumptionConfig(handle=previous_handle)
)Save new handle from session_resumption_update messages
从session_resumption_update消息中保存新的handle
**Context window compression (for long sessions):**
```python
config=types.LiveConnectConfig(
context_window_compression=types.ContextWindowCompressionConfig(
sliding_window=types.SlidingWindow()
)
)Tools in Live API: Google Search ✅, Function calling ✅, Google Maps ❌, Code execution ❌, URL context ❌
Live API function calling (manual tool response required):
python
undefined
**长会话上下文窗口压缩:**
```python
config=types.LiveConnectConfig(
context_window_compression=types.ContextWindowCompressionConfig(
sliding_window=types.SlidingWindow()
)
)Live API支持的工具: 谷歌搜索✅、函数调用✅、谷歌地图❌、代码执行❌、URL上下文❌
Live API函数调用(需要手动返回工具响应):
python
undefinedAfter receiving tool_call in response:
收到响应中的tool_call后:
await session.send_tool_response(function_responses=[
types.FunctionResponse(id=fc.id, name=fc.name, response={"result": "ok"})
for fc in response.tool_call.function_calls
])
---await session.send_tool_response(function_responses=[
types.FunctionResponse(id=fc.id, name=fc.name, response={"result": "ok"})
for fc in response.tool_call.function_calls
])
---Ephemeral Tokens (Live API Security)
临时令牌(Live API安全)
For client-side Live API connections. Short-lived tokens that expire, reducing risk vs. exposing API keys.
python
import datetime
now = datetime.datetime.now(tz=datetime.timezone.utc)
client = genai.Client(http_options={'api_version': 'v1alpha'})
token = client.auth_tokens.create(config={
'uses': 1,
'expire_time': now + datetime.timedelta(minutes=30),
'new_session_expire_time': now + datetime.timedelta(minutes=1),
'http_options': {'api_version': 'v1alpha'},
})用于客户端Live API连接,短时效令牌过期后自动失效,相比暴露API密钥风险更低。
python
import datetime
now = datetime.datetime.now(tz=datetime.timezone.utc)
client = genai.Client(http_options={'api_version': 'v1alpha'})
token = client.auth_tokens.create(config={
'uses': 1,
'expire_time': now + datetime.timedelta(minutes=30),
'new_session_expire_time': now + datetime.timedelta(minutes=1),
'http_options': {'api_version': 'v1alpha'},
})Send token.name to client; use as API key for Live API only
将token.name发送给客户端;仅可作为Live API的API密钥使用
Can lock token to specific config:
```python
'live_connect_constraints': {
'model': 'gemini-2.5-flash-native-audio-preview-12-2025',
'config': {'response_modalities': ['AUDIO']}
}
可将令牌限制为特定配置:
```python
'live_connect_constraints': {
'model': 'gemini-2.5-flash-native-audio-preview-12-2025',
'config': {'response_modalities': ['AUDIO']}
}Image Generation (Nano Banana)
图像生成(Nano Banana)
- Nano Banana 2 = — fast/high-volume
gemini-3.1-flash-image-preview - Nano Banana Pro = — pro quality, thinking
gemini-3-pro-image-preview - Nano Banana = — speed/efficiency
gemini-2.5-flash-image
python
from PIL import Image
response = client.models.generate_content(
model="gemini-3.1-flash-image-preview",
contents="Create a picture of a tropical beach at sunset"
)
for part in response.parts:
if part.text:
print(part.text)
elif part.inline_data:
image = part.as_image()
image.save("output.png")All generated images include SynthID watermark.
- Nano Banana 2 = — 高速/高吞吐量
gemini-3.1-flash-image-preview - Nano Banana Pro = — 专业画质,支持推理
gemini-3-pro-image-preview - Nano Banana = — 高速/高效
gemini-2.5-flash-image
python
from PIL import Image
response = client.models.generate_content(
model="gemini-3.1-flash-image-preview",
contents="生成一张日落时分热带海滩的图片"
)
for part in response.parts:
if part.text:
print(part.text)
elif part.inline_data:
image = part.as_image()
image.save("output.png")所有生成的图像都包含SynthID水印。
Video Generation (Veo 3.1)
视频生成(Veo 3.1)
python
operation = client.models.generate_videos(
model="veo-3.1-generate-preview",
prompt="A serene mountain lake at dawn"
)
while not operation.done:
time.sleep(10)
operation = client.operations.get(operation)
video = operation.response.generated_videos[0]
client.files.download(file=video.video)
video.video.save("output.mp4")Capabilities: 8-second 720p/1080p/4K, portrait (9:16) or landscape (16:9), audio, video extension, first/last frame specification, up to 3 reference images.
python
operation = client.models.generate_videos(
model="veo-3.1-generate-preview",
prompt="黎明时分宁静的山间湖泊"
)
while not operation.done:
time.sleep(10)
operation = client.operations.get(operation)
video = operation.response.generated_videos[0]
client.files.download(file=video.video)
video.video.save("output.mp4")能力:8秒时长,支持720p/1080p/4K分辨率,竖屏(9:16)或横屏(16:9),带音频,支持视频续画、指定首尾帧、最多3张参考图。
Image Generation (Imagen — Standalone)
图像生成(Imagen — 独立服务)
python
response = client.models.generate_images(
model='imagen-4.0-generate-001',
prompt='Robot holding a red skateboard',
config=types.GenerateImagesConfig(number_of_images=4)
)
for gen_image in response.generated_images:
gen_image.image.show()python
response = client.models.generate_images(
model='imagen-4.0-generate-001',
prompt='拿着红色滑板的机器人',
config=types.GenerateImagesConfig(number_of_images=4)
)
for gen_image in response.generated_images:
gen_image.image.show()TTS (Text-to-Speech)
TTS(文本转语音)
python
import wave
response = client.models.generate_content(
model="gemini-2.5-flash-preview-tts",
contents="Say cheerfully: Have a wonderful day!",
config=types.GenerateContentConfig(
response_modalities=["AUDIO"],
speech_config=types.SpeechConfig(
voice_config=types.VoiceConfig(
prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name='Kore')
)
)
)
)
data = response.candidates[0].content.parts[0].inline_data.data
with wave.open("out.wav", "wb") as wf:
wf.setnchannels(1)
wf.setsampwidth(2)
wf.setframerate(24000)
wf.writeframes(data)Multi-speaker TTS (up to 2 speakers):
python
config=types.GenerateContentConfig(
response_modalities=["AUDIO"],
speech_config=types.SpeechConfig(
multi_speaker_voice_config=types.MultiSpeakerVoiceConfig(
speaker_voice_configs=[
types.SpeakerVoiceConfig(
speaker='Joe',
voice_config=types.VoiceConfig(
prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name='Kore')
)
),
types.SpeakerVoiceConfig(
speaker='Jane',
voice_config=types.VoiceConfig(
prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name='Aoede')
)
),
]
)
)
)Prompt must name speakers matching the config: "TTS the conversation between Joe and Jane: Joe: ... Jane: ..."
python
import wave
response = client.models.generate_content(
model="gemini-2.5-flash-preview-tts",
contents="欢快地说:祝你拥有美好的一天!",
config=types.GenerateContentConfig(
response_modalities=["AUDIO"],
speech_config=types.SpeechConfig(
voice_config=types.VoiceConfig(
prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name='Kore')
)
)
)
)
data = response.candidates[0].content.parts[0].inline_data.data
with wave.open("out.wav", "wb") as wf:
wf.setnchannels(1)
wf.setsampwidth(2)
wf.setframerate(24000)
wf.writeframes(data)多说话人TTS(最多支持2个说话人):
python
config=types.GenerateContentConfig(
response_modalities=["AUDIO"],
speech_config=types.SpeechConfig(
multi_speaker_voice_config=types.MultiSpeakerVoiceConfig(
speaker_voice_configs=[
types.SpeakerVoiceConfig(
speaker='Joe',
voice_config=types.VoiceConfig(
prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name='Kore')
)
),
types.SpeakerVoiceConfig(
speaker='Jane',
voice_config=types.VoiceConfig(
prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name='Aoede')
)
),
]
)
)
)提示词中必须指定与配置匹配的说话人:"将Joe和Jane的对话转为语音:Joe: ... Jane: ..."
Music Generation (Lyria RealTime)
音乐生成(Lyria RealTime)
Experimental. Real-time streaming music via WebSocket.
python
client = genai.Client(http_options={'api_version': 'v1alpha'})
async with client.aio.live.music.connect(model='models/lyria-realtime-exp') as session:
await session.set_weighted_prompts(prompts=[
types.WeightedPrompt(text='minimal techno', weight=1.0)
])
await session.set_music_generation_config(
config=types.LiveMusicGenerationConfig(bpm=90, temperature=1.0)
)
await session.play()
# Receive audio chunks
async for message in session.receive():
audio_data = message.server_content.audio_chunks[0].data
# process PCM audio...Control: , , ,
Steer by sending new weighted prompts mid-stream. Reset context after BPM/scale changes.
session.play()session.pause()session.stop()session.reset_context()Steer by sending new weighted prompts mid-stream. Reset context after BPM/scale changes.
实验性功能,通过WebSocket实现实时流式音乐生成。
python
client = genai.Client(http_options={'api_version': 'v1alpha'})
async with client.aio.live.music.connect(model='models/lyria-realtime-exp') as session:
await session.set_weighted_prompts(prompts=[
types.WeightedPrompt(text='minimal techno', weight=1.0)
])
await session.set_music_generation_config(
config=types.LiveMusicGenerationConfig(bpm=90, temperature=1.0)
)
await session.play()
# 接收音频块
async for message in session.receive():
audio_data = message.server_content.audio_chunks[0].data
# 处理PCM音频...控制方法:、、、
可在流中发送新的加权提示词调整生成内容,修改BPM/音阶后需要重置上下文。
session.play()session.pause()session.stop()session.reset_context()可在流中发送新的加权提示词调整生成内容,修改BPM/音阶后需要重置上下文。
Computer Use
Computer Use
Browser automation agent. Requires Playwright or similar for action execution.
python
config = genai.types.GenerateContentConfig(
tools=[types.Tool(
computer_use=types.ComputerUse(
environment=types.Environment.ENVIRONMENT_BROWSER,
excluded_predefined_functions=["drag_and_drop"] # optional
)
)]
)
response = client.models.generate_content(
model='gemini-2.5-computer-use-preview-10-2025',
contents=[{"role": "user", "parts": [{"text": "Search Amazon for wireless headphones"}]}],
config=config
)浏览器自动化Agent,需要搭配Playwright或类似工具执行动作。
python
config = genai.types.GenerateContentConfig(
tools=[types.Tool(
computer_use=types.ComputerUse(
environment=types.Environment.ENVIRONMENT_BROWSER,
excluded_predefined_functions=["drag_and_drop"] # 可选
)
)]
)
response = client.models.generate_content(
model='gemini-2.5-computer-use-preview-10-2025',
contents=[{"role": "user", "parts": [{"text": "在亚马逊搜索无线耳机"}]}],
config=config
)Model returns function_calls with actions like type_text_at, click_at
模型返回包含type_text_at、click_at等动作的function_calls
Check response.candidates[0].content.parts for function_call items
从response.candidates[0].content.parts中查找function_call项
Coordinates are normalized 0-999; convert to actual pixels
坐标为0-999的归一化值;需要转换为实际像素
Recommended screen: 1440x900
推荐屏幕分辨率:1440x900
**Safety:** Check `safety_decision` in response — `require_confirmation` means pause before executing.
---
**安全提示:** 检查响应中的`safety_decision`字段 — 若为`require_confirmation`表示执行前需要暂停确认。
---Safety Settings
安全设置
python
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="Your prompt",
config=types.GenerateContentConfig(
safety_settings=[
types.SafetySetting(
category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold=types.HarmBlockThreshold.BLOCK_LOW_AND_ABOVE
)
]
)
)Threshold options: , , , ,
OFFBLOCK_NONEBLOCK_ONLY_HIGHBLOCK_MEDIUM_AND_ABOVEBLOCK_LOW_AND_ABOVEDefault for Gemini 2.5/3: All filters by default.
OFFCategories: Harassment, Hate speech, Sexually explicit, Dangerous
Built-in protections (cannot be disabled): Child safety, etc.
Check blocked response: → inspect .
response.candidates[0].finish_reason == "SAFETY"safety_ratingspython
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="你的提示词",
config=types.GenerateContentConfig(
safety_settings=[
types.SafetySetting(
category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold=types.HarmBlockThreshold.BLOCK_LOW_AND_ABOVE
)
]
)
)阈值选项: 、、、、
OFFBLOCK_NONEBLOCK_ONLY_HIGHBLOCK_MEDIUM_AND_ABOVEBLOCK_LOW_AND_ABOVEGemini 2.5/3默认设置: 所有过滤器默认关闭。
OFF分类: 骚扰、仇恨言论、色情内容、危险内容
内置保护机制(无法关闭):儿童安全防护等。
检查被拦截的响应: → 查看了解详情。
response.candidates[0].finish_reason == "SAFETY"safety_ratingsMedia Resolution
媒体分辨率
Control token usage for images/videos/PDFs:
Global (all models):
python
config = types.GenerateContentConfig(
media_resolution=types.MediaResolution.MEDIA_RESOLUTION_HIGH # LOW, MEDIUM, HIGH
)Per-part (Gemini 3 only, experimental):
python
client = genai.Client(http_options={'api_version': 'v1alpha'})
image_part = types.Part.from_bytes(
data=image_bytes, mime_type='image/jpeg',
media_resolution=types.MediaResolution.MEDIA_RESOLUTION_HIGH
)控制图像/视频/PDF的token消耗:
全局设置(所有模型适用):
python
config = types.GenerateContentConfig(
media_resolution=types.MediaResolution.MEDIA_RESOLUTION_HIGH # 可选LOW, MEDIUM, HIGH
)单文件设置(仅Gemini 3支持,实验性):
python
client = genai.Client(http_options={'api_version': 'v1alpha'})
image_part = types.Part.from_bytes(
data=image_bytes, mime_type='image/jpeg',
media_resolution=types.MediaResolution.MEDIA_RESOLUTION_HIGH
)Long Context
长上下文
Most Gemini models support 1M+ token context windows.
1M tokens ≈ 50K lines of code, 8 novels, 200 podcast transcripts.
Optimization: Use context caching when reusing large contexts — 4x cheaper (Flash) + lower latency.
Best practice: Put your query at the END of the prompt (after all context material).
Multi-needle limitation: Model performs ~99% on single retrieval but degrades with many simultaneous retrievals.
大多数Gemini模型支持1M+ token的上下文窗口。
1M tokens约等于 5万行代码、8部长篇小说、200份播客转录文本。
优化建议: 重复使用大上下文时使用上下文缓存 — 成本降低4倍(Flash模型)+ 延迟更低。
最佳实践: 将查询放在提示词的末尾(在所有上下文材料之后)。
多针检索限制: 模型单条检索准确率约99%,但同时进行多条检索时性能会下降。
API Versions
API版本
| Version | Use | Default? |
|---|---|---|
| Stable, production | No |
| New features, may change | Yes (SDK default) |
| Experimental only | No |
python
client = genai.Client(http_options={'api_version': 'v1'}) # force stable| 版本 | 用途 | 是否为默认值? |
|---|---|---|
| 稳定版,生产环境使用 | 否 |
| 新特性,可能会变动 | 是(SDK默认) |
| 仅用于实验性功能 | 否 |
python
client = genai.Client(http_options={'api_version': 'v1'}) # 强制使用稳定版Authentication
鉴权
API Key (default):
bash
export GEMINI_API_KEY=your_key_hereREST header:
x-goog-api-key: $GEMINI_API_KEYOAuth (for production with stricter controls):
- Enable Generative Language API in Cloud console
- Configure OAuth consent screen
- Create OAuth 2.0 Client ID
- Use application-default-credentials
API密钥(默认方式):
bash
export GEMINI_API_KEY=your_key_hereREST请求头:
x-goog-api-key: $GEMINI_API_KEYOAuth(适用于管控更严格的生产环境):
- 在云控制台开启Generative Language API
- 配置OAuth同意屏幕
- 创建OAuth 2.0客户端ID
- 使用application-default-credentials
Rate Limits & Billing
速率限制与计费
Tiers: Free Tier → Paid Tier (pay-as-you-go)
Upgrade: AI Studio → API Keys → Set up Billing
Paid tier benefits: Higher rate limits, advanced models, data not used for training.
Rate limit headers: Check headers in responses.
x-goog-quota-*Error 429: Rate limit exceeded → implement exponential backoff or request quota increase.
层级: 免费层 → 付费层(按量付费)
升级方式: AI Studio → API Keys → 设置计费
付费层权益: 更高的速率限制、高级模型、数据不会被用于训练。
速率限制请求头: 可在响应中查看开头的请求头。
x-goog-quota-*错误码429: 超出速率限制 → 实现指数退避策略或申请提升配额。
Common Error Codes
常见错误码
| HTTP | Status | Cause | Fix |
|---|---|---|---|
| 400 | INVALID_ARGUMENT | Malformed request | Check API reference |
| 400 | Missing thought_signature | Gemini 3 FC without signature | Use SDK chat or echo signatures |
| 403 | PERMISSION_DENIED | Wrong API key | Check key permissions |
| 429 | RESOURCE_EXHAUSTED | Rate limit hit | Backoff, upgrade tier |
| 500 | INTERNAL | Context too long / server error | Reduce context, retry |
| 503 | UNAVAILABLE | Overloaded | Retry or switch model |
| 504 | DEADLINE_EXCEEDED | Request too large | Increase timeout |
| HTTP状态码 | 状态 | 原因 | 解决方法 |
|---|---|---|---|
| 400 | INVALID_ARGUMENT | 请求格式错误 | 查阅API参考文档 |
| 400 | Missing thought_signature | Gemini 3函数调用未传递思想签名 | 使用SDK聊天接口或回传签名 |
| 403 | PERMISSION_DENIED | API密钥错误 | 检查密钥权限 |
| 429 | RESOURCE_EXHAUSTED | 触发速率限制 | 退避重试、升级套餐 |
| 500 | INTERNAL | 上下文过长/服务器错误 | 减少上下文长度、重试 |
| 503 | UNAVAILABLE | 服务器过载 | 重试或切换模型 |
| 504 | DEADLINE_EXCEEDED | 请求过大 | 增加超时时间 |
Framework Integrations
框架集成
CrewAI
CrewAI
python
from crewai import LLM
gemini_llm = LLM(model='gemini/gemini-3-flash-preview', api_key=api_key, temperature=1.0)python
from crewai import LLM
gemini_llm = LLM(model='gemini/gemini-3-flash-preview', api_key=api_key, temperature=1.0)LangGraph
LangGraph
python
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model="gemini-3-flash-preview")python
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model="gemini-3-flash-preview")LlamaIndex
LlamaIndex
python
from llama_index.llms.google_genai import GoogleGenAI
llm = GoogleGenAI(model="gemini-3-flash-preview")python
from llama_index.llms.google_genai import GoogleGenAI
llm = GoogleGenAI(model="gemini-3-flash-preview")Vercel AI SDK
Vercel AI SDK
bash
npm install ai @ai-sdk/googlebash
npm install ai @ai-sdk/googleSee Also
更多参考
For detailed reference on specific topics:
- Function calling deep-dive →
references/tools-and-agents.md - Files API & multimodal →
references/files-and-media.md - Caching, Batch, Live deep-dive →
references/advanced-features.md - Embeddings & RAG →
references/embeddings-and-rag.md - Image/Video generation →
references/generation.md
特定主题的详细参考请查看:
- 函数调用深度解析 →
references/tools-and-agents.md - Files API与多模态 →
references/files-and-media.md - 缓存、批量、Live API深度解析 →
references/advanced-features.md - 嵌入与RAG →
references/embeddings-and-rag.md - 图像/视频生成 →
references/generation.md