Gemini Token Optimization

Gemini Token使用优化

🚨 MANDATORY: Invoke gemini-cli-docs First

🚨 强制要求：先调用gemini-cli-docs技能

STOP - Before providing ANY response about Gemini token usage:
INVOKE
gemini-cli-docs
skill
QUERY for the specific token or pricing topic

BASE all responses EXCLUSIVELY on official documentation loaded

暂停 - 在提供任何关于Gemini Token使用的响应之前：
调用
gemini-cli-docs
技能
查询特定的Token或定价相关主题

完全基于加载的官方文档生成所有响应

Overview

概述

Skill for optimizing cost and token usage when delegating to Gemini CLI. Essential for efficient bulk operations and cost-conscious workflows.

本技能用于在使用Gemini CLI时优化成本与Token使用，是高效大规模操作和成本敏感型工作流的必备技能。

When to Use This Skill

适用场景

Keywords: token usage, cost optimization, gemini cost, model selection, flash vs pro, caching, batch queries, reduce tokens

Use this skill when:

Planning bulk Gemini operations
Optimizing costs for large-scale analysis
Choosing between Flash and Pro models
Understanding token caching benefits
Tracking usage across sessions

关键词： Token使用, 成本优化, Gemini成本, 模型选择, Flash vs Pro, 缓存, 批量查询, 减少Token消耗

在以下场景使用本技能：

规划大规模Gemini操作
优化大规模分析的成本
在Flash和Pro模型之间做选择
了解Token缓存的优势
跨会话追踪使用情况

Token Caching

Token缓存

Gemini CLI automatically caches context to reduce costs by reusing previously processed content.

Gemini CLI会自动缓存上下文，通过复用已处理内容来降低成本。

Availability

支持情况

Auth Method	Caching Available
API key (Gemini API)	YES
Vertex AI	YES
OAuth (personal/enterprise)	NO

认证方式	是否支持缓存
API key (Gemini API)	是
Vertex AI	是
OAuth（个人/企业）	否

How It Works

工作原理

System instructions and repeated context are cached
Cached tokens don't count toward billing
View savings via
```
/stats
```
command or JSON output

系统指令和重复的上下文会被缓存
已缓存的Token不会计入计费
通过
```
/stats
```
命令或JSON输出查看节省情况

Maximizing Cache Hits

最大化缓存命中率

Use consistent system prompts - Same prefix increases cache reuse
Batch similar queries - Group related analysis together
Reuse context files - Same files in same order

使用一致的系统提示词 - 相同的前缀会提升缓存复用率
批量处理相似查询 - 将相关分析任务分组
复用上下文文件 - 使用相同顺序的相同文件

Monitoring Cache Usage

监控缓存使用情况

bash

result=$(gemini "query" --output-format json)
total=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
billable=$((total - cached))
savings=$((cached * 100 / total))

echo "Total: $total tokens"
echo "Cached: $cached tokens ($savings% savings)"
echo "Billable: $billable tokens"

bash

result=$(gemini "query" --output-format json)
total=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
billable=$((total - cached))
savings=$((cached * 100 / total))

echo "Total: $total tokens"
echo "Cached: $cached tokens ($savings% savings)"
echo "Billable: $billable tokens"

Model Selection

模型选择

Model Comparison

模型对比

Model	Context Window	Speed	Cost	Quality
gemini-2.5-flash	Large	Fast	Lower	Good
gemini-2.5-pro	Very large	Slower	Higher	Best

模型	上下文窗口	速度	成本	质量
gemini-2.5-flash	大	快	较低	良好
gemini-2.5-pro	极大	慢	较高	最佳

Selection Criteria

选择标准

Use Flash (
-m gemini-2.5-flash
) when:

Processing large files (bulk analysis)
Simple extraction tasks
Cost is a primary concern
Speed is critical
Task is straightforward

Use Pro (
-m gemini-2.5-pro
) when:

Complex reasoning required
Quality is critical
Nuanced analysis needed
Task requires deep understanding
Context exceeds 1M tokens

当以下情况时使用Flash（
-m gemini-2.5-flash
）：

处理大型文件（批量分析）
简单提取任务
成本是首要考虑因素
速度要求高
任务流程简单

当以下情况时使用Pro（
-m gemini-2.5-pro
）：

需要复杂推理
质量要求极高
需要细致分析
任务需要深度理解
上下文超过100万Token

Model Selection Examples

模型选择示例

bash

undefined

bash

undefined

Bulk file analysis - use Flash

批量文件分析 - 使用Flash

for file in src/*.ts; do gemini "List all exports" -m gemini-2.5-flash --output-format json < "$file" done

Security audit - use Pro for quality

安全审计 - 使用Pro保证质量

gemini "Deep security analysis" -m gemini-2.5-pro --output-format json < critical-auth.ts

Cost tracking with model info

结合模型信息追踪成本

result=$(gemini "query" --output-format json) model=$(echo "$result" | jq -r '.stats.models | keys[0]') tokens=$(echo "$result" | jq '.stats.models | to_entries[0].value.tokens.total') echo "Used $model: $tokens tokens"

undefined

result=$(gemini "query" --output-format json) model=$(echo "$result" | jq -r '.stats.models | keys[0]') tokens=$(echo "$result" | jq '.stats.models | to_entries[0].value.tokens.total') echo "Used $model: $tokens tokens"

undefined

Batching Strategy

批量处理策略

Why Batch?

为什么要批量处理？

Reduces API overhead
Increases cache hit rate
Provides consistent context

减少API开销
提升缓存命中率
提供一致的上下文

Batching Patterns

批量处理模式

Pattern 1: Concatenate Files

模式1：拼接文件

bash

undefined

bash

undefined

Instead of N separate calls

不要调用N次独立请求

Do one call with all files

一次请求处理所有文件

cat src/*.ts | gemini "Analyze all TypeScript files for patterns" --output-format json

undefined

cat src/*.ts | gemini "Analyze all TypeScript files for patterns" --output-format json

undefined

Pattern 2: Batch Prompts

模式2：批量提示词

bash

undefined

bash

undefined

Combine related questions

组合相关问题

gemini "Answer these questions about the codebase:

What is the main architecture pattern?
How is authentication handled?
What database is used?" --output-format json

undefined

gemini "Answer these questions about the codebase:

What is the main architecture pattern?
How is authentication handled?
What database is used?" --output-format json

undefined

Pattern 3: Staged Analysis

模式3：分阶段分析

bash

undefined

bash

undefined

First pass: Quick overview with Flash

第一阶段：用Flash快速概览

overview=$(cat src/*.ts | gemini "List all modules" -m gemini-2.5-flash --output-format json)

Second pass: Deep dive critical areas with Pro

第二阶段：用Pro深度分析关键区域

echo "$overview" | jq -r '.response' | grep "auth|security" | while read module; do gemini "Deep analysis of $module" -m gemini-2.5-pro --output-format json done

undefined

echo "$overview" | jq -r '.response' | grep "auth|security" | while read module; do gemini "Deep analysis of $module" -m gemini-2.5-pro --output-format json done

undefined

Cost Tracking

成本追踪

Per-Query Tracking

单查询追踪

bash

result=$(gemini "query" --output-format json)

bash

result=$(gemini "query" --output-format json)

Extract all cost-relevant stats

提取所有成本相关统计数据

echo "$(date): tokens=$total_tokens cached=$cached_tokens models=$models_used tools=$tool_calls latency=${latency}ms" >> usage.log

undefined

echo "$(date): tokens=$total_tokens cached=$cached_tokens models=$models_used tools=$tool_calls latency=${latency}ms" >> usage.log

undefined

Session Tracking

会话追踪

bash

undefined

bash

undefined

Track cumulative usage across a session

追踪整个会话的累计使用情况

total_session_tokens=0 total_session_cached=0 total_session_calls=0

track_usage() { local result="$1" local tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0') local cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')

total_session_tokens=$((total_session_tokens + tokens)) total_session_cached=$((total_session_cached + cached)) total_session_calls=$((total_session_calls + 1)) }

total_session_tokens=0 total_session_cached=0 total_session_calls=0

track_usage() { local result="$1" local tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0') local cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')

total_session_tokens=$((total_session_tokens + tokens)) total_session_cached=$((total_session_cached + cached)) total_session_calls=$((total_session_calls + 1)) }

Use in workflow

在工作流中使用

result=$(gemini "query 1" --output-format json) track_usage "$result"

result=$(gemini "query 2" --output-format json) track_usage "$result"

echo "Session total: $total_session_tokens tokens ($total_session_cached cached) in $total_session_calls calls"

undefined

result=$(gemini "query 1" --output-format json) track_usage "$result"

result=$(gemini "query 2" --output-format json) track_usage "$result"

echo "Session total: $total_session_tokens tokens ($total_session_cached cached) in $total_session_calls calls"

undefined

Optimization Checklist

优化检查清单

Before Large Operations

大规模操作前

Choose appropriate model (Flash vs Pro)
Check if caching is available (API key or Vertex)
Plan batching strategy
Set up usage tracking

选择合适的模型（Flash vs Pro）
确认是否支持缓存（API key或Vertex）
规划批量处理策略
设置使用情况追踪

During Operations

操作过程中

Monitor cache hit rates
Track per-query costs
Adjust model if quality insufficient
Batch similar queries

监控缓存命中率
追踪单查询成本
如果质量不足则调整模型
批量处理相似查询

After Operations

操作完成后

Review total usage
Calculate effective cost
Identify optimization opportunities
Document learnings

回顾总使用量
计算实际成本
识别优化空间
记录经验总结

Quick Reference

快速参考

Cost-Saving Commands

成本节省命令

bash

undefined

bash

undefined

Use Flash for bulk

批量处理使用Flash

gemini "query" -m gemini-2.5-flash --output-format json

Check cache effectiveness

检查缓存效果

gemini "query" --output-format json | jq '{total: .stats.models | to_entries | map(.value.tokens.total) | add, cached: .stats.models | to_entries | map(.value.tokens.cached) | add}'

Minimal output (fewer output tokens)

极简输出（减少输出Token）

gemini "Answer in one sentence: {question}" --output-format json

undefined

gemini "Answer in one sentence: {question}" --output-format json

undefined

Cost Estimation

成本估算

Rough token estimates:

1 token ~ 4 characters (English)
1 page of code ~ 500-1000 tokens
Typical source file ~ 200-2000 tokens

大致Token估算：

1个Token ≈ 4个英文单词字符
1页代码 ≈ 500-1000个Token
典型源文件 ≈ 200-2000个Token

Keyword Registry (Delegates to gemini-cli-docs)

关键词注册表（委托给gemini-cli-docs）

Topic	Query Keywords
Caching	`token caching` , `cached tokens` , `/stats`
Model selection	`model routing` , `flash vs pro` , `-m flag`
Costs	`quota pricing` , `token usage` , `billing`
Output control	`output format` , `json output`

主题	查询关键词
缓存	`token caching` , `cached tokens` , `/stats`
模型选择	`model routing` , `flash vs pro` , `-m flag`
成本	`quota pricing` , `token usage` , `billing`
输出控制	`output format` , `json output`

Test Scenarios

测试场景

Scenario 1: Check Token Usage

场景1：查看Token使用量

Query: "How do I see how many tokens Gemini used?" Expected Behavior:

Skill activates on "token usage" or "gemini cost"
Provides JSON stats extraction pattern Success Criteria: User receives jq commands to extract token counts

查询："How do I see how many tokens Gemini used?" 预期行为:

技能在触发关键词“token usage”或“gemini cost”时激活
提供JSON统计数据提取模板 成功标准：用户获取到提取Token数量的jq命令

Scenario 2: Reduce Costs

场景2：降低成本

Query: "How do I reduce Gemini CLI costs for bulk analysis?" Expected Behavior:

Skill activates on "cost optimization" or "reduce tokens"
Recommends Flash model and batching Success Criteria: User receives cost optimization strategies

查询："How do I reduce Gemini CLI costs for bulk analysis?" 预期行为:

技能在触发关键词“cost optimization”或“reduce tokens”时激活
推荐使用Flash模型和批量处理 成功标准：用户获取到成本优化策略

Scenario 3: Model Selection

场景3：模型选择

Query: "Should I use Flash or Pro for this task?" Expected Behavior:

Skill activates on "flash vs pro" or "model selection"
Provides decision criteria table Success Criteria: User receives model comparison and recommendation

查询："Should I use Flash or Pro for this task?" 预期行为:

技能在触发关键词“flash vs pro”或“model selection”时激活
提供决策标准表格 成功标准：用户获取到模型对比和推荐建议

References

参考资料

Query

gemini-cli-docs

for official documentation on:

"token caching"
"model selection"
"quota and pricing"

查询

gemini-cli-docs

获取以下官方文档：

"token caching"
"model selection"
"quota and pricing"

Version History

版本历史

v1.1.0 (2025-12-01): Added Test Scenarios section
v1.0.0 (2025-11-25): Initial release

v1.1.0 (2025-12-01)：新增测试场景章节
v1.0.0 (2025-11-25)：初始版本

gemini-token-optimization

Original

Translation

Gemini Token Optimization

Gemini Token使用优化

🚨 MANDATORY: Invoke gemini-cli-docs First

🚨 强制要求：先调用gemini-cli-docs技能

Overview

概述

When to Use This Skill

适用场景

Token Caching

Token缓存

Availability

支持情况

How It Works

工作原理

Maximizing Cache Hits

最大化缓存命中率

Monitoring Cache Usage

监控缓存使用情况

Model Selection

模型选择

Model Comparison

模型对比

Selection Criteria

选择标准

Model Selection Examples

模型选择示例

Bulk file analysis - use Flash

批量文件分析 - 使用Flash

Security audit - use Pro for quality

安全审计 - 使用Pro保证质量

Cost tracking with model info

结合模型信息追踪成本

Batching Strategy

批量处理策略

Why Batch?

为什么要批量处理？

Batching Patterns

批量处理模式

Pattern 1: Concatenate Files

模式1：拼接文件

Instead of N separate calls

不要调用N次独立请求

Do one call with all files

一次请求处理所有文件

Pattern 2: Batch Prompts

模式2：批量提示词

Combine related questions

组合相关问题

Pattern 3: Staged Analysis

模式3：分阶段分析

First pass: Quick overview with Flash

第一阶段：用Flash快速概览

Second pass: Deep dive critical areas with Pro

第二阶段：用Pro深度分析关键区域

Cost Tracking

成本追踪

Per-Query Tracking

单查询追踪

Extract all cost-relevant stats

提取所有成本相关统计数据

Session Tracking

会话追踪

Track cumulative usage across a session

追踪整个会话的累计使用情况

Use in workflow

在工作流中使用

Optimization Checklist

优化检查清单

Before Large Operations

大规模操作前

During Operations

操作过程中

After Operations

操作完成后

Quick Reference

快速参考

Cost-Saving Commands