Loading...
Loading...
Optimize token usage when delegating to Gemini CLI. Covers token caching, batch queries, model selection (Flash vs Pro), and cost tracking. Use when planning bulk Gemini operations.
npx skill4agent add melodic-software/claude-code-plugins gemini-token-optimizationSTOP - Before providing ANY response about Gemini token usage:
- INVOKE
skillgemini-cli-docs- QUERY for the specific token or pricing topic
- BASE all responses EXCLUSIVELY on official documentation loaded
| Auth Method | Caching Available |
|---|---|
| API key (Gemini API) | YES |
| Vertex AI | YES |
| OAuth (personal/enterprise) | NO |
/statsresult=$(gemini "query" --output-format json)
total=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
billable=$((total - cached))
savings=$((cached * 100 / total))
echo "Total: $total tokens"
echo "Cached: $cached tokens ($savings% savings)"
echo "Billable: $billable tokens"| Model | Context Window | Speed | Cost | Quality |
|---|---|---|---|---|
| gemini-2.5-flash | Large | Fast | Lower | Good |
| gemini-2.5-pro | Very large | Slower | Higher | Best |
-m gemini-2.5-flash-m gemini-2.5-pro# Bulk file analysis - use Flash
for file in src/*.ts; do
gemini "List all exports" -m gemini-2.5-flash --output-format json < "$file"
done
# Security audit - use Pro for quality
gemini "Deep security analysis" -m gemini-2.5-pro --output-format json < critical-auth.ts
# Cost tracking with model info
result=$(gemini "query" --output-format json)
model=$(echo "$result" | jq -r '.stats.models | keys[0]')
tokens=$(echo "$result" | jq '.stats.models | to_entries[0].value.tokens.total')
echo "Used $model: $tokens tokens"# Instead of N separate calls
# Do one call with all files
cat src/*.ts | gemini "Analyze all TypeScript files for patterns" --output-format json# Combine related questions
gemini "Answer these questions about the codebase:
1. What is the main architecture pattern?
2. How is authentication handled?
3. What database is used?" --output-format json# First pass: Quick overview with Flash
overview=$(cat src/*.ts | gemini "List all modules" -m gemini-2.5-flash --output-format json)
# Second pass: Deep dive critical areas with Pro
echo "$overview" | jq -r '.response' | grep "auth\|security" | while read module; do
gemini "Deep analysis of $module" -m gemini-2.5-pro --output-format json
doneresult=$(gemini "query" --output-format json)
# Extract all cost-relevant stats
total_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
cached_tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
models_used=$(echo "$result" | jq -r '.stats.models | keys | join(", ")')
tool_calls=$(echo "$result" | jq '.stats.tools.totalCalls // 0')
latency=$(echo "$result" | jq '.stats.models | to_entries | map(.value.api.totalLatencyMs) | add // 0')
echo "$(date): tokens=$total_tokens cached=$cached_tokens models=$models_used tools=$tool_calls latency=${latency}ms" >> usage.log# Track cumulative usage across a session
total_session_tokens=0
total_session_cached=0
total_session_calls=0
track_usage() {
local result="$1"
local tokens=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.total) | add // 0')
local cached=$(echo "$result" | jq '.stats.models | to_entries | map(.value.tokens.cached) | add // 0')
total_session_tokens=$((total_session_tokens + tokens))
total_session_cached=$((total_session_cached + cached))
total_session_calls=$((total_session_calls + 1))
}
# Use in workflow
result=$(gemini "query 1" --output-format json)
track_usage "$result"
result=$(gemini "query 2" --output-format json)
track_usage "$result"
echo "Session total: $total_session_tokens tokens ($total_session_cached cached) in $total_session_calls calls"# Use Flash for bulk
gemini "query" -m gemini-2.5-flash --output-format json
# Check cache effectiveness
gemini "query" --output-format json | jq '{total: .stats.models | to_entries | map(.value.tokens.total) | add, cached: .stats.models | to_entries | map(.value.tokens.cached) | add}'
# Minimal output (fewer output tokens)
gemini "Answer in one sentence: {question}" --output-format json| Topic | Query Keywords |
|---|---|
| Caching | |
| Model selection | |
| Costs | |
| Output control | |
gemini-cli-docs