gemini-token-optimization
Original:🇺🇸 English
Translated
Optimize token usage when delegating to Gemini CLI. Covers token caching, batch queries, model selection (Flash vs Pro), and cost tracking. Use when planning bulk Gemini operations.
3installs
Added on
NPX Install
npx skill4agent add melodic-software/claude-code-plugins gemini-token-optimizationTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Gemini Token Optimization
🚨 MANDATORY: Invoke gemini-cli-docs First
STOP - Before providing ANY response about Gemini token usage:
- INVOKE
skillgemini-cli-docs- QUERY for the specific token or pricing topic
- BASE all responses EXCLUSIVELY on official documentation loaded
Overview
Skill for optimizing cost and token usage when delegating to Gemini CLI. Essential for efficient bulk operations and cost-conscious workflows.
When to Use This Skill
Keywords: token usage, cost optimization, gemini cost, model selection, flash vs pro, caching, batch queries, reduce tokens
Use this skill when:
- Planning bulk Gemini operations
- Optimizing costs for large-scale analysis
- Choosing between Flash and Pro models
- Understanding token caching benefits
- Tracking usage across sessions
Token Caching
Gemini CLI automatically caches context to reduce costs by reusing previously processed content.
Availability
| Auth Method | Caching Available |
|---|---|
| API key (Gemini API) | YES |
| Vertex AI | YES |
| OAuth (personal/enterprise) | NO |
How It Works
- System instructions and repeated context are cached
- Cached tokens don't count toward billing
- View savings via command or JSON output
/stats
Maximizing Cache Hits
- Use consistent system prompts - Same prefix increases cache reuse
- Batch similar queries - Group related analysis together
- Reuse context files - Same files in same order
Monitoring Cache Usage
bash
result=$(gemini "query" --output-format json)
total=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
billable=$((total - cached))
savings=$((cached * 100 / total))
echo "Total: $total tokens"
echo "Cached: $cached tokens ($savings% savings)"
echo "Billable: $billable tokens"Model Selection
Model Comparison
| Model | Context Window | Speed | Cost | Quality |
|---|---|---|---|---|
| gemini-2.5-flash | Large | Fast | Lower | Good |
| gemini-2.5-pro | Very large | Slower | Higher | Best |
Selection Criteria
Use Flash () when:
-m gemini-2.5-flash- Processing large files (bulk analysis)
- Simple extraction tasks
- Cost is a primary concern
- Speed is critical
- Task is straightforward
Use Pro () when:
-m gemini-2.5-pro- Complex reasoning required
- Quality is critical
- Nuanced analysis needed
- Task requires deep understanding
- Context exceeds 1M tokens
Model Selection Examples
bash
# Bulk file analysis - use Flash
for file in src/*.ts; do
gemini "List all exports" -m gemini-2.5-flash --output-format json < "$file"
done
# Security audit - use Pro for quality
gemini "Deep security analysis" -m gemini-2.5-pro --output-format json < critical-auth.ts
# Cost tracking with model info
result=$(gemini "query" --output-format json)
model=$(echo "$result" | jq -r '.stats.models | keys[0]')
tokens=$(echo "$result" | jq '.stats.models | to_entries[0].value.tokens.total')
echo "Used $model: $tokens tokens"Batching Strategy
Why Batch?
- Reduces API overhead
- Increases cache hit rate
- Provides consistent context
Batching Patterns
Pattern 1: Concatenate Files
bash
# Instead of N separate calls
# Do one call with all files
cat src/*.ts | gemini "Analyze all TypeScript files for patterns" --output-format jsonPattern 2: Batch Prompts
bash
# Combine related questions
gemini "Answer these questions about the codebase:
1. What is the main architecture pattern?
2. How is authentication handled?
3. What database is used?" --output-format jsonPattern 3: Staged Analysis
bash
# First pass: Quick overview with Flash
overview=$(cat src/*.ts | gemini "List all modules" -m gemini-2.5-flash --output-format json)
# Second pass: Deep dive critical areas with Pro
echo "$overview" | jq -r '.response' | grep "auth\|security" | while read module; do
gemini "Deep analysis of $module" -m gemini-2.5-pro --output-format json
doneCost Tracking
Per-Query Tracking
bash
result=$(gemini "query" --output-format json)
# Extract all cost-relevant stats
total_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
cached_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
models_used=$(echo "$result" | jq -r '.stats.models | keys | join(", ")')
tool_calls=$(echo "$result" | jq '.stats.tools.totalCalls // 0')
latency=$(echo "$result" | jq '.stats.models | to_entries | map(.value.api.totalLatencyMs) | add // 0')
echo "$(date): tokens=$total_tokens cached=$cached_tokens models=$models_used tools=$tool_calls latency=${latency}ms" >> usage.logSession Tracking
bash
# Track cumulative usage across a session
total_session_tokens=0
total_session_cached=0
total_session_calls=0
track_usage() {
local result="$1"
local tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
local cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
total_session_tokens=$((total_session_tokens + tokens))
total_session_cached=$((total_session_cached + cached))
total_session_calls=$((total_session_calls + 1))
}
# Use in workflow
result=$(gemini "query 1" --output-format json)
track_usage "$result"
result=$(gemini "query 2" --output-format json)
track_usage "$result"
echo "Session total: $total_session_tokens tokens ($total_session_cached cached) in $total_session_calls calls"Optimization Checklist
Before Large Operations
- Choose appropriate model (Flash vs Pro)
- Check if caching is available (API key or Vertex)
- Plan batching strategy
- Set up usage tracking
During Operations
- Monitor cache hit rates
- Track per-query costs
- Adjust model if quality insufficient
- Batch similar queries
After Operations
- Review total usage
- Calculate effective cost
- Identify optimization opportunities
- Document learnings
Quick Reference
Cost-Saving Commands
bash
# Use Flash for bulk
gemini "query" -m gemini-2.5-flash --output-format json
# Check cache effectiveness
gemini "query" --output-format json | jq '{total: .stats.models | to_entries | map(.value.tokens.total) | add, cached: .stats.models | to_entries | map(.value.tokens.cached) | add}'
# Minimal output (fewer output tokens)
gemini "Answer in one sentence: {question}" --output-format jsonCost Estimation
Rough token estimates:
- 1 token ~ 4 characters (English)
- 1 page of code ~ 500-1000 tokens
- Typical source file ~ 200-2000 tokens
Keyword Registry (Delegates to gemini-cli-docs)
| Topic | Query Keywords |
|---|---|
| Caching | |
| Model selection | |
| Costs | |
| Output control | |
Test Scenarios
Scenario 1: Check Token Usage
Query: "How do I see how many tokens Gemini used?"
Expected Behavior:
- Skill activates on "token usage" or "gemini cost"
- Provides JSON stats extraction pattern Success Criteria: User receives jq commands to extract token counts
Scenario 2: Reduce Costs
Query: "How do I reduce Gemini CLI costs for bulk analysis?"
Expected Behavior:
- Skill activates on "cost optimization" or "reduce tokens"
- Recommends Flash model and batching Success Criteria: User receives cost optimization strategies
Scenario 3: Model Selection
Query: "Should I use Flash or Pro for this task?"
Expected Behavior:
- Skill activates on "flash vs pro" or "model selection"
- Provides decision criteria table Success Criteria: User receives model comparison and recommendation
References
Query for official documentation on:
gemini-cli-docs- "token caching"
- "model selection"
- "quota and pricing"
Version History
- v1.1.0 (2025-12-01): Added Test Scenarios section
- v1.0.0 (2025-11-25): Initial release