invoking-gemini

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Invoking Gemini

调用Gemini模型

Delegate tasks to Google's Gemini models when they offer advantages over Claude.
当Gemini相比Claude具备优势时,可将任务委托给谷歌的Gemini模型。

When to Use Gemini

何时使用Gemini

Structured outputs:
  • JSON Schema validation with property ordering guarantees
  • Pydantic model compliance
  • Strict schema adherence (enum values, required fields)
Cost optimization:
  • Parallel batch processing (Gemini Flash is lightweight)
  • High-volume simple tasks
  • Budget-constrained operations
Google ecosystem:
  • Integration with Google services
  • Vertex AI workflows
  • Google-specific APIs
Multi-modal tasks:
  • Image analysis with JSON output
  • Video processing
  • Audio transcription with structure
结构化输出:
  • 支持带属性顺序保证的JSON Schema验证
  • 符合Pydantic模型规范
  • 严格遵循Schema(枚举值、必填字段)
成本优化:
  • 并行批量处理(Gemini Flash轻量高效)
  • 高容量简单任务
  • 预算受限的操作场景
谷歌生态系统:
  • 与谷歌服务集成
  • Vertex AI工作流
  • 谷歌专属API
多模态任务:
  • 带JSON输出的图像分析
  • 视频处理
  • 结构化音频转录

Available Models

可用模型

gemini-2.0-flash-exp (Recommended):
  • Fast, cost-effective
  • Native JSON Schema support
  • Good for structured outputs
gemini-1.5-pro:
  • More capable reasoning
  • Better for complex tasks
  • Higher cost
gemini-1.5-flash:
  • Balanced speed/quality
  • Good for most tasks
See references/models.md for full model details.
gemini-2.0-flash-exp(推荐):
  • 快速、性价比高
  • 原生支持JSON Schema
  • 适用于结构化输出场景
gemini-1.5-pro
  • 推理能力更强
  • 适用于复杂任务
  • 成本更高
gemini-1.5-flash
  • 速度与质量均衡
  • 适用于大多数任务
查看references/models.md获取完整模型详情。

Setup

配置步骤

Prerequisites:
  1. Install google-generativeai:
    bash
    uv pip install google-generativeai pydantic
  2. Configure API key via project knowledge file:
    Option 1 (recommended): Individual file
    • Create document:
      GOOGLE_API_KEY.txt
    • Content: Your API key (e.g.,
      AIzaSy...
      )
    Option 2: Combined file
    • Create document:
      API_CREDENTIALS.json
    • Content:
      json
      {
        "google_api_key": "AIzaSy..."
      }
前置条件:
  1. 安装google-generativeai:
    bash
    uv pip install google-generativeai pydantic
  2. 通过项目知识库文件配置API密钥:
    选项1(推荐):独立文件
    • 创建文档:
      GOOGLE_API_KEY.txt
    • 内容:你的API密钥(例如:
      AIzaSy...
    选项2:组合文件
    • 创建文档:
      API_CREDENTIALS.json
    • 内容:
      json
      {
        "google_api_key": "AIzaSy..."
      }

Basic Usage

基础用法

Import the client:
python
import sys
sys.path.append('/mnt/skills/invoking-gemini/scripts')
from gemini_client import invoke_gemini
导入客户端:
python
import sys
sys.path.append('/mnt/skills/invoking-gemini/scripts')
from gemini_client import invoke_gemini

Simple prompt

简单提示词

response = invoke_gemini( prompt="Explain quantum computing in 3 bullet points", model="gemini-2.0-flash-exp" ) print(response)
undefined
response = invoke_gemini( prompt="用3个要点解释量子计算", model="gemini-2.0-flash-exp" ) print(response)
undefined

Structured Output

结构化输出

Use Pydantic models for guaranteed JSON Schema compliance:
python
from pydantic import BaseModel, Field
from gemini_client import invoke_with_structured_output

class BookAnalysis(BaseModel):
    title: str
    genre: str = Field(description="Primary genre")
    key_themes: list[str] = Field(max_length=5)
    rating: int = Field(ge=1, le=5)

result = invoke_with_structured_output(
    prompt="Analyze the book '1984' by George Orwell",
    pydantic_model=BookAnalysis
)
使用Pydantic模型确保JSON Schema合规性:
python
from pydantic import BaseModel, Field
from gemini_client import invoke_with_structured_output

class BookAnalysis(BaseModel):
    title: str
    genre: str = Field(description="主要流派")
    key_themes: list[str] = Field(max_length=5)
    rating: int = Field(ge=1, le=5)

result = invoke_with_structured_output(
    prompt="分析乔治·奥威尔的小说《1984》",
    pydantic_model=BookAnalysis
)

result is a BookAnalysis instance

result为BookAnalysis实例

print(result.title) # "1984" print(result.genre) # "Dystopian Fiction"

**Advantages over Claude:**
- Guaranteed property ordering in JSON
- Strict enum enforcement
- Native schema validation (no prompt engineering)
- Lower cost for simple extractions
print(result.title) # "1984" print(result.genre) # "反乌托邦小说"

**相比Claude的优势:**
- 保证JSON中的属性顺序
- 严格的枚举值验证
- 原生Schema验证(无需提示词工程)
- 简单抽取任务成本更低

Parallel Invocation

并行调用

Process multiple prompts concurrently:
python
from gemini_client import invoke_parallel

prompts = [
    "Summarize the plot of Hamlet",
    "Summarize the plot of Macbeth",
    "Summarize the plot of Othello"
]

results = invoke_parallel(
    prompts=prompts,
    model="gemini-2.0-flash-exp"
)

for prompt, result in zip(prompts, results):
    print(f"Q: {prompt[:30]}...")
    print(f"A: {result[:100]}...\n")
Use cases:
  • Batch classification tasks
  • Data labeling
  • Multiple independent analyses
  • A/B testing prompts
同时处理多个提示词:
python
from gemini_client import invoke_parallel

prompts = [
    "总结《哈姆雷特》的剧情",
    "总结《麦克白》的剧情",
    "总结《奥赛罗》的剧情"
]

results = invoke_parallel(
    prompts=prompts,
    model="gemini-2.0-flash-exp"
)

for prompt, result in zip(prompts, results):
    print(f"问题:{prompt[:30]}...")
    print(f"回答:{result[:100]}...\n")
适用场景:
  • 批量分类任务
  • 数据标注
  • 多组独立分析
  • 提示词A/B测试

Error Handling

错误处理

The client handles common errors:
python
from gemini_client import invoke_gemini

response = invoke_gemini(
    prompt="Your prompt here",
    model="gemini-2.0-flash-exp"
)

if response is None:
    print("Error: API call failed")
    # Check project knowledge file for valid google_api_key
Common issues:
  • Missing API key → Add GOOGLE_API_KEY.txt to project knowledge (see Setup above)
  • Invalid model → Raises ValueError
  • Rate limit → Automatically retries with backoff
  • Network error → Returns None after retries
客户端可处理常见错误:
python
from gemini_client import invoke_gemini

response = invoke_gemini(
    prompt="你的提示词",
    model="gemini-2.0-flash-exp"
)

if response is None:
    print("错误:API调用失败")
    # 检查项目知识库文件中的有效google_api_key
常见问题:
  • 缺少API密钥 → 在项目知识库中添加
    GOOGLE_API_KEY.txt
    文件(见上方配置步骤)
  • 无效模型 → 抛出ValueError异常
  • 速率限制 → 自动重试并退避
  • 网络错误 → 重试后返回None

Advanced Features

高级功能

Custom Generation Config

自定义生成配置

python
response = invoke_gemini(
    prompt="Write a haiku",
    model="gemini-2.0-flash-exp",
    temperature=0.9,
    max_output_tokens=100,
    top_p=0.95
)
python
response = invoke_gemini(
    prompt="写一首俳句",
    model="gemini-2.0-flash-exp",
    temperature=0.9,
    max_output_tokens=100,
    top_p=0.95
)

Multi-modal Input

多模态输入

python
undefined
python
undefined

Image analysis with structured output

带结构化输出的图像分析

from pydantic import BaseModel
class ImageDescription(BaseModel): objects: list[str] scene: str colors: list[str]
result = invoke_with_structured_output( prompt="Describe this image", pydantic_model=ImageDescription, image_path="/mnt/user-data/uploads/photo.jpg" )

See [references/advanced.md](references/advanced.md) for more patterns.
from pydantic import BaseModel
class ImageDescription(BaseModel): objects: list[str] scene: str colors: list[str]
result = invoke_with_structured_output( prompt="描述这张图片", pydantic_model=ImageDescription, image_path="/mnt/user-data/uploads/photo.jpg" )

查看[references/advanced.md](references/advanced.md)获取更多使用模式。

Comparison: Gemini vs Claude

对比:Gemini vs Claude

Use Gemini when:
  • Structured output is primary goal
  • Cost is a constraint
  • Property ordering matters
  • Batch processing many simple tasks
Use Claude when:
  • Complex reasoning required
  • Long context needed (200K tokens)
  • Code generation quality matters
  • Nuanced instruction following
Use both:
  • Claude for planning/reasoning
  • Gemini for structured extraction
  • Parallel workflows with different strengths
优先使用Gemini的场景:
  • 以结构化输出为核心目标
  • 存在成本约束
  • 属性顺序有要求
  • 批量处理大量简单任务
优先使用Claude的场景:
  • 需要复杂推理
  • 需要长上下文(200K tokens)
  • 对代码生成质量要求高
  • 需要精准遵循复杂指令
混合使用场景:
  • 使用Claude进行规划与推理
  • 使用Gemini进行结构化抽取
  • 结合两者优势的并行工作流

Token Efficiency Pattern

令牌效率模式

Gemini Flash is cost-effective for sub-tasks:
python
undefined
Gemini Flash适用于成本敏感的子任务:
python
undefined

Claude (you) plans the approach

Claude(你)规划执行方案

Gemini executes structured extractions

Gemini执行结构化抽取

data_points = [] for file in uploaded_files: # Gemini extracts structured data result = invoke_with_structured_output( prompt=f"Extract contact info from {file}", pydantic_model=ContactInfo ) data_points.append(result)
data_points = [] for file in uploaded_files: # Gemini抽取结构化数据 result = invoke_with_structured_output( prompt=f"从{file}中提取联系信息", pydantic_model=ContactInfo ) data_points.append(result)

Claude synthesizes results

Claude整合结果

... your analysis here ...

... 你的分析代码 ...

undefined
undefined

Limitations

局限性

Not suitable for:
  • Tasks requiring deep reasoning
  • Long context (>1M tokens)
  • Complex code generation
  • Subjective creative writing
Token limits:
  • gemini-2.0-flash-exp: ~1M input tokens
  • gemini-1.5-pro: ~2M input tokens
Rate limits:
  • Vary by API tier
  • Client handles automatic retry
不适用于以下场景:
  • 需要深度推理的任务
  • 超长上下文(>1M tokens)
  • 复杂代码生成
  • 主观创意写作
令牌限制:
  • gemini-2.0-flash-exp:约1M输入令牌
  • gemini-1.5-pro:约2M输入令牌
速率限制:
  • 因API层级而异
  • 客户端会自动处理重试

Examples

示例

See references/examples.md for:
  • Data extraction from documents
  • Batch classification
  • Multi-modal analysis
  • Hybrid Claude+Gemini workflows
查看references/examples.md获取以下场景示例:
  • 文档数据抽取
  • 批量分类
  • 多模态分析
  • Claude+Gemini混合工作流

Troubleshooting

故障排除

"API key not configured":
  • Add project knowledge file
    GOOGLE_API_KEY.txt
    with your API key
  • Or add to
    API_CREDENTIALS.json
    :
    {"google_api_key": "AIzaSy..."}
  • See Setup section above for details
Import errors:
bash
uv pip install google-generativeai pydantic
Schema validation failures:
  • Check Pydantic model definitions
  • Ensure prompt is clear about expected structure
  • Add examples to prompt if needed
提示“API key not configured”:
  • 在项目知识库中添加
    GOOGLE_API_KEY.txt
    文件并填入你的API密钥
  • 或在
    API_CREDENTIALS.json
    中添加:
    {"google_api_key": "AIzaSy..."}
  • 详情见上方配置步骤
导入错误:
bash
uv pip install google-generativeai pydantic
Schema验证失败:
  • 检查Pydantic模型定义
  • 确保提示词清晰说明预期结构
  • 必要时在提示词中添加示例

Cost Comparison

成本对比

Approximate pricing (as of 2024):
Gemini 2.0 Flash:
  • Input: $0.15 / 1M tokens
  • Output: $0.60 / 1M tokens
Claude Sonnet:
  • Input: $3.00 / 1M tokens
  • Output: $15.00 / 1M tokens
For 1000 simple extraction tasks (100 tokens each):
  • Gemini Flash: ~$0.10
  • Claude Sonnet: ~$2.00
Strategy: Use Claude for complex reasoning, Gemini for high-volume simple tasks.
大致定价(截至2024年):
Gemini 2.0 Flash:
  • 输入:$0.15 / 1M令牌
  • 输出:$0.60 / 1M令牌
Claude Sonnet:
  • 输入:$3.00 / 1M令牌
  • 输出:$15.00 / 1M令牌
对于1000个简单抽取任务(每个100令牌):
  • Gemini Flash:约$0.10
  • Claude Sonnet:约$2.00
策略: 使用Claude处理复杂推理,使用Gemini处理高容量简单任务。