gemini-token-optimization
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGemini Token Optimization
Gemini Token使用优化
🚨 MANDATORY: Invoke gemini-cli-docs First
🚨 强制要求:先调用gemini-cli-docs技能
STOP - Before providing ANY response about Gemini token usage:
- INVOKE
skillgemini-cli-docs- QUERY for the specific token or pricing topic
- BASE all responses EXCLUSIVELY on official documentation loaded
暂停 - 在提供任何关于Gemini Token使用的响应之前:
- 调用
技能gemini-cli-docs- 查询特定的Token或定价相关主题
- 完全基于加载的官方文档生成所有响应
Overview
概述
Skill for optimizing cost and token usage when delegating to Gemini CLI. Essential for efficient bulk operations and cost-conscious workflows.
本技能用于在使用Gemini CLI时优化成本与Token使用,是高效大规模操作和成本敏感型工作流的必备技能。
When to Use This Skill
适用场景
Keywords: token usage, cost optimization, gemini cost, model selection, flash vs pro, caching, batch queries, reduce tokens
Use this skill when:
- Planning bulk Gemini operations
- Optimizing costs for large-scale analysis
- Choosing between Flash and Pro models
- Understanding token caching benefits
- Tracking usage across sessions
关键词: Token使用, 成本优化, Gemini成本, 模型选择, Flash vs Pro, 缓存, 批量查询, 减少Token消耗
在以下场景使用本技能:
- 规划大规模Gemini操作
- 优化大规模分析的成本
- 在Flash和Pro模型之间做选择
- 了解Token缓存的优势
- 跨会话追踪使用情况
Token Caching
Token缓存
Gemini CLI automatically caches context to reduce costs by reusing previously processed content.
Gemini CLI会自动缓存上下文,通过复用已处理内容来降低成本。
Availability
支持情况
| Auth Method | Caching Available |
|---|---|
| API key (Gemini API) | YES |
| Vertex AI | YES |
| OAuth (personal/enterprise) | NO |
| 认证方式 | 是否支持缓存 |
|---|---|
| API key (Gemini API) | 是 |
| Vertex AI | 是 |
| OAuth(个人/企业) | 否 |
How It Works
工作原理
- System instructions and repeated context are cached
- Cached tokens don't count toward billing
- View savings via command or JSON output
/stats
- 系统指令和重复的上下文会被缓存
- 已缓存的Token不会计入计费
- 通过命令或JSON输出查看节省情况
/stats
Maximizing Cache Hits
最大化缓存命中率
- Use consistent system prompts - Same prefix increases cache reuse
- Batch similar queries - Group related analysis together
- Reuse context files - Same files in same order
- 使用一致的系统提示词 - 相同的前缀会提升缓存复用率
- 批量处理相似查询 - 将相关分析任务分组
- 复用上下文文件 - 使用相同顺序的相同文件
Monitoring Cache Usage
监控缓存使用情况
bash
result=$(gemini "query" --output-format json)
total=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
billable=$((total - cached))
savings=$((cached * 100 / total))
echo "Total: $total tokens"
echo "Cached: $cached tokens ($savings% savings)"
echo "Billable: $billable tokens"bash
result=$(gemini "query" --output-format json)
total=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
billable=$((total - cached))
savings=$((cached * 100 / total))
echo "Total: $total tokens"
echo "Cached: $cached tokens ($savings% savings)"
echo "Billable: $billable tokens"Model Selection
模型选择
Model Comparison
模型对比
| Model | Context Window | Speed | Cost | Quality |
|---|---|---|---|---|
| gemini-2.5-flash | Large | Fast | Lower | Good |
| gemini-2.5-pro | Very large | Slower | Higher | Best |
| 模型 | 上下文窗口 | 速度 | 成本 | 质量 |
|---|---|---|---|---|
| gemini-2.5-flash | 大 | 快 | 较低 | 良好 |
| gemini-2.5-pro | 极大 | 慢 | 较高 | 最佳 |
Selection Criteria
选择标准
Use Flash () when:
-m gemini-2.5-flash- Processing large files (bulk analysis)
- Simple extraction tasks
- Cost is a primary concern
- Speed is critical
- Task is straightforward
Use Pro () when:
-m gemini-2.5-pro- Complex reasoning required
- Quality is critical
- Nuanced analysis needed
- Task requires deep understanding
- Context exceeds 1M tokens
当以下情况时使用Flash():
-m gemini-2.5-flash- 处理大型文件(批量分析)
- 简单提取任务
- 成本是首要考虑因素
- 速度要求高
- 任务流程简单
当以下情况时使用Pro():
-m gemini-2.5-pro- 需要复杂推理
- 质量要求极高
- 需要细致分析
- 任务需要深度理解
- 上下文超过100万Token
Model Selection Examples
模型选择示例
bash
undefinedbash
undefinedBulk file analysis - use Flash
批量文件分析 - 使用Flash
for file in src/*.ts; do
gemini "List all exports" -m gemini-2.5-flash --output-format json < "$file"
done
for file in src/*.ts; do
gemini "List all exports" -m gemini-2.5-flash --output-format json < "$file"
done
Security audit - use Pro for quality
安全审计 - 使用Pro保证质量
gemini "Deep security analysis" -m gemini-2.5-pro --output-format json < critical-auth.ts
gemini "Deep security analysis" -m gemini-2.5-pro --output-format json < critical-auth.ts
Cost tracking with model info
结合模型信息追踪成本
result=$(gemini "query" --output-format json)
model=$(echo "$result" | jq -r '.stats.models | keys[0]')
tokens=$(echo "$result" | jq '.stats.models | to_entries[0].value.tokens.total')
echo "Used $model: $tokens tokens"
undefinedresult=$(gemini "query" --output-format json)
model=$(echo "$result" | jq -r '.stats.models | keys[0]')
tokens=$(echo "$result" | jq '.stats.models | to_entries[0].value.tokens.total')
echo "Used $model: $tokens tokens"
undefinedBatching Strategy
批量处理策略
Why Batch?
为什么要批量处理?
- Reduces API overhead
- Increases cache hit rate
- Provides consistent context
- 减少API开销
- 提升缓存命中率
- 提供一致的上下文
Batching Patterns
批量处理模式
Pattern 1: Concatenate Files
模式1:拼接文件
bash
undefinedbash
undefinedInstead of N separate calls
不要调用N次独立请求
Do one call with all files
一次请求处理所有文件
cat src/*.ts | gemini "Analyze all TypeScript files for patterns" --output-format json
undefinedcat src/*.ts | gemini "Analyze all TypeScript files for patterns" --output-format json
undefinedPattern 2: Batch Prompts
模式2:批量提示词
bash
undefinedbash
undefinedCombine related questions
组合相关问题
gemini "Answer these questions about the codebase:
- What is the main architecture pattern?
- How is authentication handled?
- What database is used?" --output-format json
undefinedgemini "Answer these questions about the codebase:
- What is the main architecture pattern?
- How is authentication handled?
- What database is used?" --output-format json
undefinedPattern 3: Staged Analysis
模式3:分阶段分析
bash
undefinedbash
undefinedFirst pass: Quick overview with Flash
第一阶段:用Flash快速概览
overview=$(cat src/*.ts | gemini "List all modules" -m gemini-2.5-flash --output-format json)
overview=$(cat src/*.ts | gemini "List all modules" -m gemini-2.5-flash --output-format json)
Second pass: Deep dive critical areas with Pro
第二阶段:用Pro深度分析关键区域
echo "$overview" | jq -r '.response' | grep "auth|security" | while read module; do
gemini "Deep analysis of $module" -m gemini-2.5-pro --output-format json
done
undefinedecho "$overview" | jq -r '.response' | grep "auth|security" | while read module; do
gemini "Deep analysis of $module" -m gemini-2.5-pro --output-format json
done
undefinedCost Tracking
成本追踪
Per-Query Tracking
单查询追踪
bash
result=$(gemini "query" --output-format json)bash
result=$(gemini "query" --output-format json)Extract all cost-relevant stats
提取所有成本相关统计数据
total_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
cached_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
models_used=$(echo "$result" | jq -r '.stats.models | keys | join(", ")')
tool_calls=$(echo "$result" | jq '.stats.tools.totalCalls // 0')
latency=$(echo "$result" | jq '.stats.models | to_entries | map(.value.api.totalLatencyMs) | add // 0')
echo "$(date): tokens=$total_tokens cached=$cached_tokens models=$models_used tools=$tool_calls latency=${latency}ms" >> usage.log
undefinedtotal_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
cached_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
models_used=$(echo "$result" | jq -r '.stats.models | keys | join(", ")')
tool_calls=$(echo "$result" | jq '.stats.tools.totalCalls // 0')
latency=$(echo "$result" | jq '.stats.models | to_entries | map(.value.api.totalLatencyMs) | add // 0')
echo "$(date): tokens=$total_tokens cached=$cached_tokens models=$models_used tools=$tool_calls latency=${latency}ms" >> usage.log
undefinedSession Tracking
会话追踪
bash
undefinedbash
undefinedTrack cumulative usage across a session
追踪整个会话的累计使用情况
total_session_tokens=0
total_session_cached=0
total_session_calls=0
track_usage() {
local result="$1"
local tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
local cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
total_session_tokens=$((total_session_tokens + tokens))
total_session_cached=$((total_session_cached + cached))
total_session_calls=$((total_session_calls + 1))
}
total_session_tokens=0
total_session_cached=0
total_session_calls=0
track_usage() {
local result="$1"
local tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
local cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
total_session_tokens=$((total_session_tokens + tokens))
total_session_cached=$((total_session_cached + cached))
total_session_calls=$((total_session_calls + 1))
}
Use in workflow
在工作流中使用
result=$(gemini "query 1" --output-format json)
track_usage "$result"
result=$(gemini "query 2" --output-format json)
track_usage "$result"
echo "Session total: $total_session_tokens tokens ($total_session_cached cached) in $total_session_calls calls"
undefinedresult=$(gemini "query 1" --output-format json)
track_usage "$result"
result=$(gemini "query 2" --output-format json)
track_usage "$result"
echo "Session total: $total_session_tokens tokens ($total_session_cached cached) in $total_session_calls calls"
undefinedOptimization Checklist
优化检查清单
Before Large Operations
大规模操作前
- Choose appropriate model (Flash vs Pro)
- Check if caching is available (API key or Vertex)
- Plan batching strategy
- Set up usage tracking
- 选择合适的模型(Flash vs Pro)
- 确认是否支持缓存(API key或Vertex)
- 规划批量处理策略
- 设置使用情况追踪
During Operations
操作过程中
- Monitor cache hit rates
- Track per-query costs
- Adjust model if quality insufficient
- Batch similar queries
- 监控缓存命中率
- 追踪单查询成本
- 如果质量不足则调整模型
- 批量处理相似查询
After Operations
操作完成后
- Review total usage
- Calculate effective cost
- Identify optimization opportunities
- Document learnings
- 回顾总使用量
- 计算实际成本
- 识别优化空间
- 记录经验总结
Quick Reference
快速参考
Cost-Saving Commands
成本节省命令
bash
undefinedbash
undefinedUse Flash for bulk
批量处理使用Flash
gemini "query" -m gemini-2.5-flash --output-format json
gemini "query" -m gemini-2.5-flash --output-format json
Check cache effectiveness
检查缓存效果
gemini "query" --output-format json | jq '{total: .stats.models | to_entries | map(.value.tokens.total) | add, cached: .stats.models | to_entries | map(.value.tokens.cached) | add}'
gemini "query" --output-format json | jq '{total: .stats.models | to_entries | map(.value.tokens.total) | add, cached: .stats.models | to_entries | map(.value.tokens.cached) | add}'
Minimal output (fewer output tokens)
极简输出(减少输出Token)
gemini "Answer in one sentence: {question}" --output-format json
undefinedgemini "Answer in one sentence: {question}" --output-format json
undefinedCost Estimation
成本估算
Rough token estimates:
- 1 token ~ 4 characters (English)
- 1 page of code ~ 500-1000 tokens
- Typical source file ~ 200-2000 tokens
大致Token估算:
- 1个Token ≈ 4个英文单词字符
- 1页代码 ≈ 500-1000个Token
- 典型源文件 ≈ 200-2000个Token
Keyword Registry (Delegates to gemini-cli-docs)
关键词注册表(委托给gemini-cli-docs)
| Topic | Query Keywords |
|---|---|
| Caching | |
| Model selection | |
| Costs | |
| Output control | |
| 主题 | 查询关键词 |
|---|---|
| 缓存 | |
| 模型选择 | |
| 成本 | |
| 输出控制 | |
Test Scenarios
测试场景
Scenario 1: Check Token Usage
场景1:查看Token使用量
Query: "How do I see how many tokens Gemini used?"
Expected Behavior:
- Skill activates on "token usage" or "gemini cost"
- Provides JSON stats extraction pattern Success Criteria: User receives jq commands to extract token counts
查询:"How do I see how many tokens Gemini used?"
预期行为:
- 技能在触发关键词“token usage”或“gemini cost”时激活
- 提供JSON统计数据提取模板 成功标准:用户获取到提取Token数量的jq命令
Scenario 2: Reduce Costs
场景2:降低成本
Query: "How do I reduce Gemini CLI costs for bulk analysis?"
Expected Behavior:
- Skill activates on "cost optimization" or "reduce tokens"
- Recommends Flash model and batching Success Criteria: User receives cost optimization strategies
查询:"How do I reduce Gemini CLI costs for bulk analysis?"
预期行为:
- 技能在触发关键词“cost optimization”或“reduce tokens”时激活
- 推荐使用Flash模型和批量处理 成功标准:用户获取到成本优化策略
Scenario 3: Model Selection
场景3:模型选择
Query: "Should I use Flash or Pro for this task?"
Expected Behavior:
- Skill activates on "flash vs pro" or "model selection"
- Provides decision criteria table Success Criteria: User receives model comparison and recommendation
查询:"Should I use Flash or Pro for this task?"
预期行为:
- 技能在触发关键词“flash vs pro”或“model selection”时激活
- 提供决策标准表格 成功标准:用户获取到模型对比和推荐建议
References
参考资料
Query for official documentation on:
gemini-cli-docs- "token caching"
- "model selection"
- "quota and pricing"
查询获取以下官方文档:
gemini-cli-docs- "token caching"
- "model selection"
- "quota and pricing"
Version History
版本历史
- v1.1.0 (2025-12-01): Added Test Scenarios section
- v1.0.0 (2025-11-25): Initial release
- v1.1.0 (2025-12-01):新增测试场景章节
- v1.0.0 (2025-11-25):初始版本