cost-verification-auditor

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Cost Verification Auditor

成本验证审计工具

Verify that token cost estimates are within ±20% of actual Claude API usage.
验证令牌成本估算值与实际Claude API使用值的偏差是否在±20%以内。

When to Use

适用场景

Use for:
  • Validating token estimation systems after implementation
  • Pre-deployment cost accuracy checks
  • Debugging unexpected API bills
  • Periodic estimation drift detection
NOT for:
  • Looking up model pricing (use pricing docs)
  • Budget planning or forecasting
  • Cost optimization strategies
  • Comparing models by price
适用情况
  • 令牌估算系统部署后的有效性验证
  • 上线前的成本准确性检查
  • 排查异常API账单问题
  • 定期检测估算值漂移情况
不适用情况
  • 查询模型定价(请使用定价文档)
  • 预算规划或预测
  • 成本优化策略制定
  • 模型价格对比

Core Audit Process

核心审计流程

Decision Tree

决策树

Has estimator? ──No──→ Build estimator first (see Calibration Guidelines)
     Yes
Define 3+ test cases (simple/medium/complex)
Estimate BEFORE execution (no peeking!)
Execute against real API
Calculate variance: (actual - estimated) / estimated
Variance ≤ ±20%? ──Yes──→ PASS ✓
     No
Apply fixes from Anti-Patterns section
Re-run verification
Has estimator? ──No──→ Build estimator first (see Calibration Guidelines)
     Yes
Define 3+ test cases (simple/medium/complex)
Estimate BEFORE execution (no peeking!)
Execute against real API
Calculate variance: (actual - estimated) / estimated
Variance ≤ ±20%? ──Yes──→ PASS ✓
     No
Apply fixes from Anti-Patterns section
Re-run verification

Variance Formula

偏差计算公式

typescript
const inputVariance = (actual.inputTokens - estimate.inputTokens) / estimate.inputTokens;
const outputVariance = (actual.outputTokens - estimate.outputTokens) / estimate.outputTokens;
const costVariance = (actual.totalCost - estimate.totalCost) / estimate.totalCost;

// PASS if both input AND output within ±20%
const passed = Math.abs(inputVariance) <= 0.20 && Math.abs(outputVariance) <= 0.20;
typescript
const inputVariance = (actual.inputTokens - estimate.inputTokens) / estimate.inputTokens;
const outputVariance = (actual.outputTokens - estimate.outputTokens) / estimate.outputTokens;
const costVariance = (actual.totalCost - estimate.totalCost) / estimate.totalCost;

// PASS if both input AND output within ±20%
const passed = Math.abs(inputVariance) <= 0.20 && Math.abs(outputVariance) <= 0.20;

Common Anti-Patterns

常见反模式

Anti-Pattern: The 500-Token Overhead Myth

反模式:500令牌开销误区

Novice thinking: "Claude Code adds ~500 tokens overhead, so add that to every estimate."
Reality: Direct API calls have ~10 token overhead. The 500+ overhead is ONLY when using Claude Code's full context (system prompts, tools, conversation history).
Timeline:
  • Pre-2025: Many tutorials used 500+ token estimates
  • 2025+: Direct API overhead is minimal (~10 tokens)
What to use instead:
ContextOverhead
Direct API call~10 tokens
With system prompt50-200 tokens
With tools/functions100-500 tokens
Claude Code full context500-2000 tokens
How to detect: Consistent 40-90% overestimation = overhead too high.

新手误区:「Claude Code会增加约500令牌的开销,所以每个估算都要加上这个值。」
实际情况:直接API调用的开销约为10个令牌。只有在使用Claude Code的完整上下文(系统提示词、工具、对话历史)时,才会产生500+的令牌开销。
时间线
  • 2025年前:许多教程使用500+的令牌估算值
  • 2025年起:直接API调用的开销极小(约10个令牌)
正确做法
上下文场景令牌开销
直接API调用~10 tokens
包含系统提示词50-200 tokens
包含工具/函数100-500 tokens
Claude Code完整上下文500-2000 tokens
检测方式:持续出现40-90%的高估 → 开销设置过高。

Anti-Pattern: Per-Node Accuracy Obsession

反模式:过度追求单节点准确性

Novice thinking: "Every node must be within ±20% or the estimator is broken."
Reality: LLM output length is non-deterministic. Per-node output variance of 30-50% is normal. What matters is aggregate cost accuracy.
What to use instead:
  • Focus on total DAG cost variance (should be ±20%)
  • Accept per-node output variance up to ±40%
  • Use constrained prompts ("list exactly 3") to reduce variance
How to detect: Input estimates accurate, output varies wildly = normal LLM behavior.

新手误区:「每个节点的估算都必须在±20%以内,否则估算器就是坏的。」
实际情况:LLM的输出长度具有不确定性。单节点输出偏差在30-50%属于正常情况,重要的是总成本的准确性
正确做法
  • 关注整个DAG的总成本偏差(应在±20%以内)
  • 接受单节点输出偏差最高可达±40%
  • 使用约束性提示词(如「列出恰好3项」)来减少偏差
检测方式:输入估算准确,但输出波动极大 → 属于LLM的正常行为。

Anti-Pattern: Peeking Before Estimating

反模式:先执行再估算

Novice thinking: "Let me run the API call first to see what tokens we get, then build the estimator."
Reality: This produces perfectly-fitted estimates that fail on new prompts. Estimation must happen BEFORE execution.
Correct approach:
  1. Estimate based on prompt length and heuristics
  2. Execute API call
  3. Compare variance
  4. Adjust heuristics if needed
新手误区:「我先调用API看看实际使用的令牌数,再构建估算器。」
实际情况:这种方式得到的估算值完全适配现有场景,但在面对新提示词时会失效。估算必须在执行前完成。
正确步骤
  1. 根据提示词长度和启发式规则进行估算
  2. 执行API调用
  3. 对比偏差
  4. 如有需要,调整启发式规则

Calibration Guidelines

校准指南

Input Token Estimation

输入令牌估算

typescript
// Calibrated 2026-01-30
const inputTokens = Math.ceil(prompt.length / CHARS_PER_TOKEN) + OVERHEAD;
Text TypeCHARS_PER_TOKENNotes
English prose4.0Most consistent
Code3.0-3.5Symbols tokenize differently
Mixed3.5Balanced (recommended default)
JSON/structured3.0Punctuation heavy
typescript
// Calibrated 2026-01-30
const inputTokens = Math.ceil(prompt.length / CHARS_PER_TOKEN) + OVERHEAD;
文本类型CHARS_PER_TOKEN说明
英文散文4.0一致性最高
代码3.0-3.5符号的令牌化方式不同
混合文本3.5平衡值(推荐默认使用)
JSON/结构化文本3.0标点符号密集

Output Token Estimation

输出令牌估算

Prompt ConstraintMultiplierNotes
"List exactly N items"0.8x inputHighly constrained
"Brief summary"1.0x inputModerate
"Explain in detail"2-3x inputExpansive
Unconstrained1.5x inputVariable
Always: Minimum 100 output tokens for any meaningful response.
提示词约束乘数说明
「列出恰好N项」0.8x input约束性极强
「简要总结」1.0x input中等约束
「详细解释」2-3x input扩展性强
无约束1.5x input波动较大
注意:任何有意义的回复,输出令牌数最小值为100。

Model Behavior

模型行为

ModelOutput Tendency
Claude OpusLonger, more detailed
Claude SonnetBalanced
Claude HaikuConcise, efficient
模型输出倾向
Claude Opus更长、更详细
Claude Sonnet平衡适中
Claude Haiku简洁、高效

Quick Fixes

快速修复方案

SymptomCauseFix
Overestimating by 40%+Overhead too highReduce from 500 → 10
Underestimating inputsChars/token too highReduce from 4.0 → 3.5
Output wildly variesLLM non-determinismUse constrained prompts
Total cost accurate but per-node offNormal aggregationAccept it, focus on totals
症状原因修复方法
高估40%以上开销设置过高从500调整为10
输入估算偏低字符/令牌比值过高从4.0调整为3.5
输出波动极大LLM的不确定性使用约束性提示词
总成本准确但单节点偏差大正常聚合效果接受该情况,关注总成本

Verification Checklist

验证检查清单

  • 3+ test cases (simple, medium, complex)
  • Estimates run BEFORE API calls
  • Variance formula:
    (actual - estimated) / estimated
  • Target: ±20% for input AND output
  • Report includes actionable recommendations
  • 3个及以上测试用例(简单、中等、复杂)
  • 估算在API调用前完成
  • 使用偏差公式:
    (actual - estimated) / estimated
  • 目标:输入和输出偏差均在±20%以内
  • 报告包含可执行的改进建议

References

参考资料

See
/references/calibration-data.md
for detailed calibration tables and historical data.
详见
/references/calibration-data.md
获取详细校准表格和历史数据。