Loading...
Loading...
Audit LLM token cost estimates against actual API usage. Activate on 'cost verification', 'token estimate accuracy', 'API cost audit', 'estimation variance'. NOT for pricing lookups, budget planning, or cost optimization strategies.
npx skill4agent add erichowens/some_claude_skills cost-verification-auditorHas estimator? ──No──→ Build estimator first (see Calibration Guidelines)
│
Yes
↓
Define 3+ test cases (simple/medium/complex)
↓
Estimate BEFORE execution (no peeking!)
↓
Execute against real API
↓
Calculate variance: (actual - estimated) / estimated
↓
Variance ≤ ±20%? ──Yes──→ PASS ✓
│
No
↓
Apply fixes from Anti-Patterns section
↓
Re-run verificationconst inputVariance = (actual.inputTokens - estimate.inputTokens) / estimate.inputTokens;
const outputVariance = (actual.outputTokens - estimate.outputTokens) / estimate.outputTokens;
const costVariance = (actual.totalCost - estimate.totalCost) / estimate.totalCost;
// PASS if both input AND output within ±20%
const passed = Math.abs(inputVariance) <= 0.20 && Math.abs(outputVariance) <= 0.20;| Context | Overhead |
|---|---|
| Direct API call | ~10 tokens |
| With system prompt | 50-200 tokens |
| With tools/functions | 100-500 tokens |
| Claude Code full context | 500-2000 tokens |
// Calibrated 2026-01-30
const inputTokens = Math.ceil(prompt.length / CHARS_PER_TOKEN) + OVERHEAD;| Text Type | CHARS_PER_TOKEN | Notes |
|---|---|---|
| English prose | 4.0 | Most consistent |
| Code | 3.0-3.5 | Symbols tokenize differently |
| Mixed | 3.5 | Balanced (recommended default) |
| JSON/structured | 3.0 | Punctuation heavy |
| Prompt Constraint | Multiplier | Notes |
|---|---|---|
| "List exactly N items" | 0.8x input | Highly constrained |
| "Brief summary" | 1.0x input | Moderate |
| "Explain in detail" | 2-3x input | Expansive |
| Unconstrained | 1.5x input | Variable |
| Model | Output Tendency |
|---|---|
| Claude Opus | Longer, more detailed |
| Claude Sonnet | Balanced |
| Claude Haiku | Concise, efficient |
| Symptom | Cause | Fix |
|---|---|---|
| Overestimating by 40%+ | Overhead too high | Reduce from 500 → 10 |
| Underestimating inputs | Chars/token too high | Reduce from 4.0 → 3.5 |
| Output wildly varies | LLM non-determinism | Use constrained prompts |
| Total cost accurate but per-node off | Normal aggregation | Accept it, focus on totals |
(actual - estimated) / estimated/references/calibration-data.md