gemini-token-optimization

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Gemini Token Optimization

Gemini Token使用优化

🚨 MANDATORY: Invoke gemini-cli-docs First

🚨 强制要求:先调用gemini-cli-docs技能

STOP - Before providing ANY response about Gemini token usage:
  1. INVOKE
    gemini-cli-docs
    skill
  2. QUERY for the specific token or pricing topic
  3. BASE all responses EXCLUSIVELY on official documentation loaded
暂停 - 在提供任何关于Gemini Token使用的响应之前:
  1. 调用
    gemini-cli-docs
    技能
  2. 查询特定的Token或定价相关主题
  3. 完全基于加载的官方文档生成所有响应

Overview

概述

Skill for optimizing cost and token usage when delegating to Gemini CLI. Essential for efficient bulk operations and cost-conscious workflows.
本技能用于在使用Gemini CLI时优化成本与Token使用,是高效大规模操作和成本敏感型工作流的必备技能。

When to Use This Skill

适用场景

Keywords: token usage, cost optimization, gemini cost, model selection, flash vs pro, caching, batch queries, reduce tokens
Use this skill when:
  • Planning bulk Gemini operations
  • Optimizing costs for large-scale analysis
  • Choosing between Flash and Pro models
  • Understanding token caching benefits
  • Tracking usage across sessions
关键词: Token使用, 成本优化, Gemini成本, 模型选择, Flash vs Pro, 缓存, 批量查询, 减少Token消耗
在以下场景使用本技能:
  • 规划大规模Gemini操作
  • 优化大规模分析的成本
  • 在Flash和Pro模型之间做选择
  • 了解Token缓存的优势
  • 跨会话追踪使用情况

Token Caching

Token缓存

Gemini CLI automatically caches context to reduce costs by reusing previously processed content.
Gemini CLI会自动缓存上下文,通过复用已处理内容来降低成本。

Availability

支持情况

Auth MethodCaching Available
API key (Gemini API)YES
Vertex AIYES
OAuth (personal/enterprise)NO
认证方式是否支持缓存
API key (Gemini API)
Vertex AI
OAuth(个人/企业)

How It Works

工作原理

  • System instructions and repeated context are cached
  • Cached tokens don't count toward billing
  • View savings via
    /stats
    command or JSON output
  • 系统指令和重复的上下文会被缓存
  • 已缓存的Token不会计入计费
  • 通过
    /stats
    命令或JSON输出查看节省情况

Maximizing Cache Hits

最大化缓存命中率

  1. Use consistent system prompts - Same prefix increases cache reuse
  2. Batch similar queries - Group related analysis together
  3. Reuse context files - Same files in same order
  1. 使用一致的系统提示词 - 相同的前缀会提升缓存复用率
  2. 批量处理相似查询 - 将相关分析任务分组
  3. 复用上下文文件 - 使用相同顺序的相同文件

Monitoring Cache Usage

监控缓存使用情况

bash
result=$(gemini "query" --output-format json)
total=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
billable=$((total - cached))
savings=$((cached * 100 / total))

echo "Total: $total tokens"
echo "Cached: $cached tokens ($savings% savings)"
echo "Billable: $billable tokens"
bash
result=$(gemini "query" --output-format json)
total=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
billable=$((total - cached))
savings=$((cached * 100 / total))

echo "Total: $total tokens"
echo "Cached: $cached tokens ($savings% savings)"
echo "Billable: $billable tokens"

Model Selection

模型选择

Model Comparison

模型对比

ModelContext WindowSpeedCostQuality
gemini-2.5-flashLargeFastLowerGood
gemini-2.5-proVery largeSlowerHigherBest
模型上下文窗口速度成本质量
gemini-2.5-flash较低良好
gemini-2.5-pro极大较高最佳

Selection Criteria

选择标准

Use Flash (
-m gemini-2.5-flash
) when:
  • Processing large files (bulk analysis)
  • Simple extraction tasks
  • Cost is a primary concern
  • Speed is critical
  • Task is straightforward
Use Pro (
-m gemini-2.5-pro
) when:
  • Complex reasoning required
  • Quality is critical
  • Nuanced analysis needed
  • Task requires deep understanding
  • Context exceeds 1M tokens
当以下情况时使用Flash(
-m gemini-2.5-flash
):
  • 处理大型文件(批量分析)
  • 简单提取任务
  • 成本是首要考虑因素
  • 速度要求高
  • 任务流程简单
当以下情况时使用Pro(
-m gemini-2.5-pro
):
  • 需要复杂推理
  • 质量要求极高
  • 需要细致分析
  • 任务需要深度理解
  • 上下文超过100万Token

Model Selection Examples

模型选择示例

bash
undefined
bash
undefined

Bulk file analysis - use Flash

批量文件分析 - 使用Flash

for file in src/*.ts; do gemini "List all exports" -m gemini-2.5-flash --output-format json < "$file" done
for file in src/*.ts; do gemini "List all exports" -m gemini-2.5-flash --output-format json < "$file" done

Security audit - use Pro for quality

安全审计 - 使用Pro保证质量

gemini "Deep security analysis" -m gemini-2.5-pro --output-format json < critical-auth.ts
gemini "Deep security analysis" -m gemini-2.5-pro --output-format json < critical-auth.ts

Cost tracking with model info

结合模型信息追踪成本

result=$(gemini "query" --output-format json) model=$(echo "$result" | jq -r '.stats.models | keys[0]') tokens=$(echo "$result" | jq '.stats.models | to_entries[0].value.tokens.total') echo "Used $model: $tokens tokens"
undefined
result=$(gemini "query" --output-format json) model=$(echo "$result" | jq -r '.stats.models | keys[0]') tokens=$(echo "$result" | jq '.stats.models | to_entries[0].value.tokens.total') echo "Used $model: $tokens tokens"
undefined

Batching Strategy

批量处理策略

Why Batch?

为什么要批量处理?

  • Reduces API overhead
  • Increases cache hit rate
  • Provides consistent context
  • 减少API开销
  • 提升缓存命中率
  • 提供一致的上下文

Batching Patterns

批量处理模式

Pattern 1: Concatenate Files

模式1:拼接文件

bash
undefined
bash
undefined

Instead of N separate calls

不要调用N次独立请求

Do one call with all files

一次请求处理所有文件

cat src/*.ts | gemini "Analyze all TypeScript files for patterns" --output-format json
undefined
cat src/*.ts | gemini "Analyze all TypeScript files for patterns" --output-format json
undefined

Pattern 2: Batch Prompts

模式2:批量提示词

bash
undefined
bash
undefined

Combine related questions

组合相关问题

gemini "Answer these questions about the codebase:
  1. What is the main architecture pattern?
  2. How is authentication handled?
  3. What database is used?" --output-format json
undefined
gemini "Answer these questions about the codebase:
  1. What is the main architecture pattern?
  2. How is authentication handled?
  3. What database is used?" --output-format json
undefined

Pattern 3: Staged Analysis

模式3:分阶段分析

bash
undefined
bash
undefined

First pass: Quick overview with Flash

第一阶段:用Flash快速概览

overview=$(cat src/*.ts | gemini "List all modules" -m gemini-2.5-flash --output-format json)
overview=$(cat src/*.ts | gemini "List all modules" -m gemini-2.5-flash --output-format json)

Second pass: Deep dive critical areas with Pro

第二阶段:用Pro深度分析关键区域

echo "$overview" | jq -r '.response' | grep "auth|security" | while read module; do gemini "Deep analysis of $module" -m gemini-2.5-pro --output-format json done
undefined
echo "$overview" | jq -r '.response' | grep "auth|security" | while read module; do gemini "Deep analysis of $module" -m gemini-2.5-pro --output-format json done
undefined

Cost Tracking

成本追踪

Per-Query Tracking

单查询追踪

bash
result=$(gemini "query" --output-format json)
bash
result=$(gemini "query" --output-format json)

Extract all cost-relevant stats

提取所有成本相关统计数据

total_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0') cached_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0') models_used=$(echo "$result" | jq -r '.stats.models | keys | join(", ")') tool_calls=$(echo "$result" | jq '.stats.tools.totalCalls // 0') latency=$(echo "$result" | jq '.stats.models | to_entries | map(.value.api.totalLatencyMs) | add // 0')
echo "$(date): tokens=$total_tokens cached=$cached_tokens models=$models_used tools=$tool_calls latency=${latency}ms" >> usage.log
undefined
total_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0') cached_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0') models_used=$(echo "$result" | jq -r '.stats.models | keys | join(", ")') tool_calls=$(echo "$result" | jq '.stats.tools.totalCalls // 0') latency=$(echo "$result" | jq '.stats.models | to_entries | map(.value.api.totalLatencyMs) | add // 0')
echo "$(date): tokens=$total_tokens cached=$cached_tokens models=$models_used tools=$tool_calls latency=${latency}ms" >> usage.log
undefined

Session Tracking

会话追踪

bash
undefined
bash
undefined

Track cumulative usage across a session

追踪整个会话的累计使用情况

total_session_tokens=0 total_session_cached=0 total_session_calls=0
track_usage() { local result="$1" local tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0') local cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
total_session_tokens=$((total_session_tokens + tokens)) total_session_cached=$((total_session_cached + cached)) total_session_calls=$((total_session_calls + 1)) }
total_session_tokens=0 total_session_cached=0 total_session_calls=0
track_usage() { local result="$1" local tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0') local cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
total_session_tokens=$((total_session_tokens + tokens)) total_session_cached=$((total_session_cached + cached)) total_session_calls=$((total_session_calls + 1)) }

Use in workflow

在工作流中使用

result=$(gemini "query 1" --output-format json) track_usage "$result"
result=$(gemini "query 2" --output-format json) track_usage "$result"
echo "Session total: $total_session_tokens tokens ($total_session_cached cached) in $total_session_calls calls"
undefined
result=$(gemini "query 1" --output-format json) track_usage "$result"
result=$(gemini "query 2" --output-format json) track_usage "$result"
echo "Session total: $total_session_tokens tokens ($total_session_cached cached) in $total_session_calls calls"
undefined

Optimization Checklist

优化检查清单

Before Large Operations

大规模操作前

  • Choose appropriate model (Flash vs Pro)
  • Check if caching is available (API key or Vertex)
  • Plan batching strategy
  • Set up usage tracking
  • 选择合适的模型(Flash vs Pro)
  • 确认是否支持缓存(API key或Vertex)
  • 规划批量处理策略
  • 设置使用情况追踪

During Operations

操作过程中

  • Monitor cache hit rates
  • Track per-query costs
  • Adjust model if quality insufficient
  • Batch similar queries
  • 监控缓存命中率
  • 追踪单查询成本
  • 如果质量不足则调整模型
  • 批量处理相似查询

After Operations

操作完成后

  • Review total usage
  • Calculate effective cost
  • Identify optimization opportunities
  • Document learnings
  • 回顾总使用量
  • 计算实际成本
  • 识别优化空间
  • 记录经验总结

Quick Reference

快速参考

Cost-Saving Commands

成本节省命令

bash
undefined
bash
undefined

Use Flash for bulk

批量处理使用Flash

gemini "query" -m gemini-2.5-flash --output-format json
gemini "query" -m gemini-2.5-flash --output-format json

Check cache effectiveness

检查缓存效果

gemini "query" --output-format json | jq '{total: .stats.models | to_entries | map(.value.tokens.total) | add, cached: .stats.models | to_entries | map(.value.tokens.cached) | add}'
gemini "query" --output-format json | jq '{total: .stats.models | to_entries | map(.value.tokens.total) | add, cached: .stats.models | to_entries | map(.value.tokens.cached) | add}'

Minimal output (fewer output tokens)

极简输出(减少输出Token)

gemini "Answer in one sentence: {question}" --output-format json
undefined
gemini "Answer in one sentence: {question}" --output-format json
undefined

Cost Estimation

成本估算

Rough token estimates:
  • 1 token ~ 4 characters (English)
  • 1 page of code ~ 500-1000 tokens
  • Typical source file ~ 200-2000 tokens
大致Token估算:
  • 1个Token ≈ 4个英文单词字符
  • 1页代码 ≈ 500-1000个Token
  • 典型源文件 ≈ 200-2000个Token

Keyword Registry (Delegates to gemini-cli-docs)

关键词注册表(委托给gemini-cli-docs)

TopicQuery Keywords
Caching
token caching
,
cached tokens
,
/stats
Model selection
model routing
,
flash vs pro
,
-m flag
Costs
quota pricing
,
token usage
,
billing
Output control
output format
,
json output
主题查询关键词
缓存
token caching
,
cached tokens
,
/stats
模型选择
model routing
,
flash vs pro
,
-m flag
成本
quota pricing
,
token usage
,
billing
输出控制
output format
,
json output

Test Scenarios

测试场景

Scenario 1: Check Token Usage

场景1:查看Token使用量

Query: "How do I see how many tokens Gemini used?" Expected Behavior:
  • Skill activates on "token usage" or "gemini cost"
  • Provides JSON stats extraction pattern Success Criteria: User receives jq commands to extract token counts
查询:"How do I see how many tokens Gemini used?" 预期行为:
  • 技能在触发关键词“token usage”或“gemini cost”时激活
  • 提供JSON统计数据提取模板 成功标准:用户获取到提取Token数量的jq命令

Scenario 2: Reduce Costs

场景2:降低成本

Query: "How do I reduce Gemini CLI costs for bulk analysis?" Expected Behavior:
  • Skill activates on "cost optimization" or "reduce tokens"
  • Recommends Flash model and batching Success Criteria: User receives cost optimization strategies
查询:"How do I reduce Gemini CLI costs for bulk analysis?" 预期行为:
  • 技能在触发关键词“cost optimization”或“reduce tokens”时激活
  • 推荐使用Flash模型和批量处理 成功标准:用户获取到成本优化策略

Scenario 3: Model Selection

场景3:模型选择

Query: "Should I use Flash or Pro for this task?" Expected Behavior:
  • Skill activates on "flash vs pro" or "model selection"
  • Provides decision criteria table Success Criteria: User receives model comparison and recommendation
查询:"Should I use Flash or Pro for this task?" 预期行为:
  • 技能在触发关键词“flash vs pro”或“model selection”时激活
  • 提供决策标准表格 成功标准:用户获取到模型对比和推荐建议

References

参考资料

Query
gemini-cli-docs
for official documentation on:
  • "token caching"
  • "model selection"
  • "quota and pricing"
查询
gemini-cli-docs
获取以下官方文档:
  • "token caching"
  • "model selection"
  • "quota and pricing"

Version History

版本历史

  • v1.1.0 (2025-12-01): Added Test Scenarios section
  • v1.0.0 (2025-11-25): Initial release
  • v1.1.0 (2025-12-01):新增测试场景章节
  • v1.0.0 (2025-11-25):初始版本