invoking-gemini

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Invoking Gemini

调用Gemini模型

Delegate tasks to Google's Gemini models when they offer advantages over Claude.

当Gemini相比Claude具备优势时，可将任务委托给谷歌的Gemini模型。

When to Use Gemini

何时使用Gemini

Structured outputs:

JSON Schema validation with property ordering guarantees
Pydantic model compliance
Strict schema adherence (enum values, required fields)

Cost optimization:

Parallel batch processing (Gemini Flash is lightweight)
High-volume simple tasks
Budget-constrained operations

Google ecosystem:

Integration with Google services
Vertex AI workflows
Google-specific APIs

Multi-modal tasks:

Image analysis with JSON output
Video processing
Audio transcription with structure

结构化输出：

支持带属性顺序保证的JSON Schema验证
符合Pydantic模型规范
严格遵循Schema（枚举值、必填字段）

成本优化：

并行批量处理（Gemini Flash轻量高效）
高容量简单任务
预算受限的操作场景

谷歌生态系统：

与谷歌服务集成
Vertex AI工作流
谷歌专属API

多模态任务：

带JSON输出的图像分析
视频处理
结构化音频转录

Available Models

可用模型

gemini-2.0-flash-exp (Recommended):

Fast, cost-effective
Native JSON Schema support
Good for structured outputs

gemini-1.5-pro:

More capable reasoning
Better for complex tasks
Higher cost

gemini-1.5-flash:

Balanced speed/quality
Good for most tasks

See references/models.md for full model details.

gemini-2.0-flash-exp（推荐）：

快速、性价比高
原生支持JSON Schema
适用于结构化输出场景

gemini-1.5-pro：

推理能力更强
适用于复杂任务
成本更高

gemini-1.5-flash：

速度与质量均衡
适用于大多数任务

查看references/models.md获取完整模型详情。

Setup

配置步骤

Prerequisites:

Install google-generativeai:

bash

uv pip install google-generativeai pydantic

Configure API key via project knowledge file:

Option 1 (recommended): Individual file
- Create document:
```
GOOGLE_API_KEY.txt
```
- Content: Your API key (e.g.,
```
AIzaSy...
```
  )
Option 2: Combined file
- Create document:
```
API_CREDENTIALS.json
```
- Content:
  json
```
{
  "google_api_key": "AIzaSy..."
}
```
Get your API key: https://console.cloud.google.com/apis/credentials

前置条件：

安装google-generativeai：

bash

uv pip install google-generativeai pydantic

通过项目知识库文件配置API密钥：

选项1（推荐）：独立文件
- 创建文档：
```
GOOGLE_API_KEY.txt
```
- 内容：你的API密钥（例如：
```
AIzaSy...
```
  ）
选项2：组合文件
- 创建文档：
```
API_CREDENTIALS.json
```
- 内容：
  json
```
{
  "google_api_key": "AIzaSy..."
}
```
获取API密钥：https://console.cloud.google.com/apis/credentials

Basic Usage

基础用法

Import the client:

python

import sys
sys.path.append('/mnt/skills/invoking-gemini/scripts')
from gemini_client import invoke_gemini

导入客户端：

python

import sys
sys.path.append('/mnt/skills/invoking-gemini/scripts')
from gemini_client import invoke_gemini

Simple prompt

简单提示词

response = invoke_gemini( prompt="Explain quantum computing in 3 bullet points", model="gemini-2.0-flash-exp" ) print(response)

undefined

response = invoke_gemini( prompt="用3个要点解释量子计算", model="gemini-2.0-flash-exp" ) print(response)

undefined

Structured Output

结构化输出

Use Pydantic models for guaranteed JSON Schema compliance:

python

from pydantic import BaseModel, Field
from gemini_client import invoke_with_structured_output

class BookAnalysis(BaseModel):
    title: str
    genre: str = Field(description="Primary genre")
    key_themes: list[str] = Field(max_length=5)
    rating: int = Field(ge=1, le=5)

result = invoke_with_structured_output(
    prompt="Analyze the book '1984' by George Orwell",
    pydantic_model=BookAnalysis
)

使用Pydantic模型确保JSON Schema合规性：

python

from pydantic import BaseModel, Field
from gemini_client import invoke_with_structured_output

class BookAnalysis(BaseModel):
    title: str
    genre: str = Field(description="主要流派")
    key_themes: list[str] = Field(max_length=5)
    rating: int = Field(ge=1, le=5)

result = invoke_with_structured_output(
    prompt="分析乔治·奥威尔的小说《1984》",
    pydantic_model=BookAnalysis
)

result is a BookAnalysis instance

result为BookAnalysis实例

print(result.title) # "1984" print(result.genre) # "Dystopian Fiction"


**Advantages over Claude:**
- Guaranteed property ordering in JSON
- Strict enum enforcement
- Native schema validation (no prompt engineering)
- Lower cost for simple extractions

print(result.title) # "1984" print(result.genre) # "反乌托邦小说"


**相比Claude的优势：**
- 保证JSON中的属性顺序
- 严格的枚举值验证
- 原生Schema验证（无需提示词工程）
- 简单抽取任务成本更低

Parallel Invocation

并行调用

Process multiple prompts concurrently:

python

from gemini_client import invoke_parallel

prompts = [
    "Summarize the plot of Hamlet",
    "Summarize the plot of Macbeth",
    "Summarize the plot of Othello"
]

results = invoke_parallel(
    prompts=prompts,
    model="gemini-2.0-flash-exp"
)

for prompt, result in zip(prompts, results):
    print(f"Q: {prompt[:30]}...")
    print(f"A: {result[:100]}...\n")

Use cases:

Batch classification tasks
Data labeling
Multiple independent analyses
A/B testing prompts

同时处理多个提示词：

python

from gemini_client import invoke_parallel

prompts = [
    "总结《哈姆雷特》的剧情",
    "总结《麦克白》的剧情",
    "总结《奥赛罗》的剧情"
]

results = invoke_parallel(
    prompts=prompts,
    model="gemini-2.0-flash-exp"
)

for prompt, result in zip(prompts, results):
    print(f"问题：{prompt[:30]}...")
    print(f"回答：{result[:100]}...\n")

适用场景：

批量分类任务
数据标注
多组独立分析
提示词A/B测试

Error Handling

错误处理

The client handles common errors:

python

from gemini_client import invoke_gemini

response = invoke_gemini(
    prompt="Your prompt here",
    model="gemini-2.0-flash-exp"
)

if response is None:
    print("Error: API call failed")
    # Check project knowledge file for valid google_api_key

Common issues:

Missing API key → Add GOOGLE_API_KEY.txt to project knowledge (see Setup above)
Invalid model → Raises ValueError
Rate limit → Automatically retries with backoff
Network error → Returns None after retries

客户端可处理常见错误：

python

from gemini_client import invoke_gemini

response = invoke_gemini(
    prompt="你的提示词",
    model="gemini-2.0-flash-exp"
)

if response is None:
    print("错误：API调用失败")
    # 检查项目知识库文件中的有效google_api_key

常见问题：

缺少API密钥 → 在项目知识库中添加
```
GOOGLE_API_KEY.txt
```
文件（见上方配置步骤）
无效模型 → 抛出ValueError异常
速率限制 → 自动重试并退避
网络错误 → 重试后返回None

Advanced Features

高级功能

Custom Generation Config

自定义生成配置

python

response = invoke_gemini(
    prompt="Write a haiku",
    model="gemini-2.0-flash-exp",
    temperature=0.9,
    max_output_tokens=100,
    top_p=0.95
)

python

response = invoke_gemini(
    prompt="写一首俳句",
    model="gemini-2.0-flash-exp",
    temperature=0.9,
    max_output_tokens=100,
    top_p=0.95
)

Multi-modal Input

多模态输入

python

undefined

python

undefined

Image analysis with structured output

带结构化输出的图像分析

from pydantic import BaseModel

class ImageDescription(BaseModel): objects: list[str] scene: str colors: list[str]

result = invoke_with_structured_output( prompt="Describe this image", pydantic_model=ImageDescription, image_path="/mnt/user-data/uploads/photo.jpg" )


See [references/advanced.md](references/advanced.md) for more patterns.

from pydantic import BaseModel

class ImageDescription(BaseModel): objects: list[str] scene: str colors: list[str]

result = invoke_with_structured_output( prompt="描述这张图片", pydantic_model=ImageDescription, image_path="/mnt/user-data/uploads/photo.jpg" )


查看[references/advanced.md](references/advanced.md)获取更多使用模式。

Comparison: Gemini vs Claude

对比：Gemini vs Claude

Use Gemini when:

Structured output is primary goal
Cost is a constraint
Property ordering matters
Batch processing many simple tasks

Use Claude when:

Complex reasoning required
Long context needed (200K tokens)
Code generation quality matters
Nuanced instruction following

Use both:

Claude for planning/reasoning
Gemini for structured extraction
Parallel workflows with different strengths

优先使用Gemini的场景：

以结构化输出为核心目标
存在成本约束
属性顺序有要求
批量处理大量简单任务

优先使用Claude的场景：

需要复杂推理
需要长上下文（200K tokens）
对代码生成质量要求高
需要精准遵循复杂指令

混合使用场景：

使用Claude进行规划与推理
使用Gemini进行结构化抽取
结合两者优势的并行工作流

Token Efficiency Pattern

令牌效率模式

Gemini Flash is cost-effective for sub-tasks:

python

undefined

Gemini Flash适用于成本敏感的子任务：

python

undefined

Claude (you) plans the approach

Claude（你）规划执行方案

Gemini executes structured extractions

Gemini执行结构化抽取

data_points = [] for file in uploaded_files: # Gemini extracts structured data result = invoke_with_structured_output( prompt=f"Extract contact info from {file}", pydantic_model=ContactInfo ) data_points.append(result)

data_points = [] for file in uploaded_files: # Gemini抽取结构化数据 result = invoke_with_structured_output( prompt=f"从{file}中提取联系信息", pydantic_model=ContactInfo ) data_points.append(result)

Claude synthesizes results

Claude整合结果

... your analysis here ...

... 你的分析代码 ...

undefined

undefined

Limitations

局限性

Not suitable for:

Tasks requiring deep reasoning
Long context (>1M tokens)
Complex code generation
Subjective creative writing

Token limits:

gemini-2.0-flash-exp: ~1M input tokens
gemini-1.5-pro: ~2M input tokens

Rate limits:

Vary by API tier
Client handles automatic retry

不适用于以下场景：

需要深度推理的任务
超长上下文（>1M tokens）
复杂代码生成
主观创意写作

令牌限制：

gemini-2.0-flash-exp：约1M输入令牌
gemini-1.5-pro：约2M输入令牌

速率限制：

因API层级而异
客户端会自动处理重试

Examples

示例

See references/examples.md for:

Data extraction from documents
Batch classification
Multi-modal analysis
Hybrid Claude+Gemini workflows

查看references/examples.md获取以下场景示例：

文档数据抽取
批量分类
多模态分析
Claude+Gemini混合工作流

Troubleshooting

故障排除

"API key not configured":

Add project knowledge file
```
GOOGLE_API_KEY.txt
```
with your API key

Or add to

API_CREDENTIALS.json

{"google_api_key": "AIzaSy..."}

See Setup section above for details

Import errors:

bash

uv pip install google-generativeai pydantic

Schema validation failures:

Check Pydantic model definitions
Ensure prompt is clear about expected structure
Add examples to prompt if needed

提示“API key not configured”：

在项目知识库中添加
```
GOOGLE_API_KEY.txt
```
文件并填入你的API密钥

或在

API_CREDENTIALS.json

中添加：

{"google_api_key": "AIzaSy..."}

详情见上方配置步骤

导入错误：

bash

uv pip install google-generativeai pydantic

Schema验证失败：

检查Pydantic模型定义
确保提示词清晰说明预期结构
必要时在提示词中添加示例

Cost Comparison

成本对比

Approximate pricing (as of 2024):

Gemini 2.0 Flash:

Input: $0.15 / 1M tokens
Output: $0.60 / 1M tokens

Claude Sonnet:

Input: $3.00 / 1M tokens
Output: $15.00 / 1M tokens

For 1000 simple extraction tasks (100 tokens each):

Gemini Flash: ~$0.10
Claude Sonnet: ~$2.00

Strategy: Use Claude for complex reasoning, Gemini for high-volume simple tasks.

大致定价（截至2024年）：

Gemini 2.0 Flash：

输入：$0.15 / 1M令牌
输出：$0.60 / 1M令牌

Claude Sonnet：

输入：$3.00 / 1M令牌
输出：$15.00 / 1M令牌

对于1000个简单抽取任务（每个100令牌）：

Gemini Flash：约$0.10
Claude Sonnet：约$2.00

策略： 使用Claude处理复杂推理，使用Gemini处理高容量简单任务。