cost-verification-auditor
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCost Verification Auditor
成本验证审计工具
Verify that token cost estimates are within ±20% of actual Claude API usage.
验证令牌成本估算值与实际Claude API使用值的偏差是否在±20%以内。
When to Use
适用场景
✅ Use for:
- Validating token estimation systems after implementation
- Pre-deployment cost accuracy checks
- Debugging unexpected API bills
- Periodic estimation drift detection
❌ NOT for:
- Looking up model pricing (use pricing docs)
- Budget planning or forecasting
- Cost optimization strategies
- Comparing models by price
✅ 适用情况:
- 令牌估算系统部署后的有效性验证
- 上线前的成本准确性检查
- 排查异常API账单问题
- 定期检测估算值漂移情况
❌ 不适用情况:
- 查询模型定价(请使用定价文档)
- 预算规划或预测
- 成本优化策略制定
- 模型价格对比
Core Audit Process
核心审计流程
Decision Tree
决策树
Has estimator? ──No──→ Build estimator first (see Calibration Guidelines)
│
Yes
↓
Define 3+ test cases (simple/medium/complex)
↓
Estimate BEFORE execution (no peeking!)
↓
Execute against real API
↓
Calculate variance: (actual - estimated) / estimated
↓
Variance ≤ ±20%? ──Yes──→ PASS ✓
│
No
↓
Apply fixes from Anti-Patterns section
↓
Re-run verificationHas estimator? ──No──→ Build estimator first (see Calibration Guidelines)
│
Yes
↓
Define 3+ test cases (simple/medium/complex)
↓
Estimate BEFORE execution (no peeking!)
↓
Execute against real API
↓
Calculate variance: (actual - estimated) / estimated
↓
Variance ≤ ±20%? ──Yes──→ PASS ✓
│
No
↓
Apply fixes from Anti-Patterns section
↓
Re-run verificationVariance Formula
偏差计算公式
typescript
const inputVariance = (actual.inputTokens - estimate.inputTokens) / estimate.inputTokens;
const outputVariance = (actual.outputTokens - estimate.outputTokens) / estimate.outputTokens;
const costVariance = (actual.totalCost - estimate.totalCost) / estimate.totalCost;
// PASS if both input AND output within ±20%
const passed = Math.abs(inputVariance) <= 0.20 && Math.abs(outputVariance) <= 0.20;typescript
const inputVariance = (actual.inputTokens - estimate.inputTokens) / estimate.inputTokens;
const outputVariance = (actual.outputTokens - estimate.outputTokens) / estimate.outputTokens;
const costVariance = (actual.totalCost - estimate.totalCost) / estimate.totalCost;
// PASS if both input AND output within ±20%
const passed = Math.abs(inputVariance) <= 0.20 && Math.abs(outputVariance) <= 0.20;Common Anti-Patterns
常见反模式
Anti-Pattern: The 500-Token Overhead Myth
反模式:500令牌开销误区
Novice thinking: "Claude Code adds ~500 tokens overhead, so add that to every estimate."
Reality: Direct API calls have ~10 token overhead. The 500+ overhead is ONLY when using Claude Code's full context (system prompts, tools, conversation history).
Timeline:
- Pre-2025: Many tutorials used 500+ token estimates
- 2025+: Direct API overhead is minimal (~10 tokens)
What to use instead:
| Context | Overhead |
|---|---|
| Direct API call | ~10 tokens |
| With system prompt | 50-200 tokens |
| With tools/functions | 100-500 tokens |
| Claude Code full context | 500-2000 tokens |
How to detect: Consistent 40-90% overestimation = overhead too high.
新手误区:「Claude Code会增加约500令牌的开销,所以每个估算都要加上这个值。」
实际情况:直接API调用的开销约为10个令牌。只有在使用Claude Code的完整上下文(系统提示词、工具、对话历史)时,才会产生500+的令牌开销。
时间线:
- 2025年前:许多教程使用500+的令牌估算值
- 2025年起:直接API调用的开销极小(约10个令牌)
正确做法:
| 上下文场景 | 令牌开销 |
|---|---|
| 直接API调用 | ~10 tokens |
| 包含系统提示词 | 50-200 tokens |
| 包含工具/函数 | 100-500 tokens |
| Claude Code完整上下文 | 500-2000 tokens |
检测方式:持续出现40-90%的高估 → 开销设置过高。
Anti-Pattern: Per-Node Accuracy Obsession
反模式:过度追求单节点准确性
Novice thinking: "Every node must be within ±20% or the estimator is broken."
Reality: LLM output length is non-deterministic. Per-node output variance of 30-50% is normal. What matters is aggregate cost accuracy.
What to use instead:
- Focus on total DAG cost variance (should be ±20%)
- Accept per-node output variance up to ±40%
- Use constrained prompts ("list exactly 3") to reduce variance
How to detect: Input estimates accurate, output varies wildly = normal LLM behavior.
新手误区:「每个节点的估算都必须在±20%以内,否则估算器就是坏的。」
实际情况:LLM的输出长度具有不确定性。单节点输出偏差在30-50%属于正常情况,重要的是总成本的准确性。
正确做法:
- 关注整个DAG的总成本偏差(应在±20%以内)
- 接受单节点输出偏差最高可达±40%
- 使用约束性提示词(如「列出恰好3项」)来减少偏差
检测方式:输入估算准确,但输出波动极大 → 属于LLM的正常行为。
Anti-Pattern: Peeking Before Estimating
反模式:先执行再估算
Novice thinking: "Let me run the API call first to see what tokens we get, then build the estimator."
Reality: This produces perfectly-fitted estimates that fail on new prompts. Estimation must happen BEFORE execution.
Correct approach:
- Estimate based on prompt length and heuristics
- Execute API call
- Compare variance
- Adjust heuristics if needed
新手误区:「我先调用API看看实际使用的令牌数,再构建估算器。」
实际情况:这种方式得到的估算值完全适配现有场景,但在面对新提示词时会失效。估算必须在执行前完成。
正确步骤:
- 根据提示词长度和启发式规则进行估算
- 执行API调用
- 对比偏差
- 如有需要,调整启发式规则
Calibration Guidelines
校准指南
Input Token Estimation
输入令牌估算
typescript
// Calibrated 2026-01-30
const inputTokens = Math.ceil(prompt.length / CHARS_PER_TOKEN) + OVERHEAD;| Text Type | CHARS_PER_TOKEN | Notes |
|---|---|---|
| English prose | 4.0 | Most consistent |
| Code | 3.0-3.5 | Symbols tokenize differently |
| Mixed | 3.5 | Balanced (recommended default) |
| JSON/structured | 3.0 | Punctuation heavy |
typescript
// Calibrated 2026-01-30
const inputTokens = Math.ceil(prompt.length / CHARS_PER_TOKEN) + OVERHEAD;| 文本类型 | CHARS_PER_TOKEN | 说明 |
|---|---|---|
| 英文散文 | 4.0 | 一致性最高 |
| 代码 | 3.0-3.5 | 符号的令牌化方式不同 |
| 混合文本 | 3.5 | 平衡值(推荐默认使用) |
| JSON/结构化文本 | 3.0 | 标点符号密集 |
Output Token Estimation
输出令牌估算
| Prompt Constraint | Multiplier | Notes |
|---|---|---|
| "List exactly N items" | 0.8x input | Highly constrained |
| "Brief summary" | 1.0x input | Moderate |
| "Explain in detail" | 2-3x input | Expansive |
| Unconstrained | 1.5x input | Variable |
Always: Minimum 100 output tokens for any meaningful response.
| 提示词约束 | 乘数 | 说明 |
|---|---|---|
| 「列出恰好N项」 | 0.8x input | 约束性极强 |
| 「简要总结」 | 1.0x input | 中等约束 |
| 「详细解释」 | 2-3x input | 扩展性强 |
| 无约束 | 1.5x input | 波动较大 |
注意:任何有意义的回复,输出令牌数最小值为100。
Model Behavior
模型行为
| Model | Output Tendency |
|---|---|
| Claude Opus | Longer, more detailed |
| Claude Sonnet | Balanced |
| Claude Haiku | Concise, efficient |
| 模型 | 输出倾向 |
|---|---|
| Claude Opus | 更长、更详细 |
| Claude Sonnet | 平衡适中 |
| Claude Haiku | 简洁、高效 |
Quick Fixes
快速修复方案
| Symptom | Cause | Fix |
|---|---|---|
| Overestimating by 40%+ | Overhead too high | Reduce from 500 → 10 |
| Underestimating inputs | Chars/token too high | Reduce from 4.0 → 3.5 |
| Output wildly varies | LLM non-determinism | Use constrained prompts |
| Total cost accurate but per-node off | Normal aggregation | Accept it, focus on totals |
| 症状 | 原因 | 修复方法 |
|---|---|---|
| 高估40%以上 | 开销设置过高 | 从500调整为10 |
| 输入估算偏低 | 字符/令牌比值过高 | 从4.0调整为3.5 |
| 输出波动极大 | LLM的不确定性 | 使用约束性提示词 |
| 总成本准确但单节点偏差大 | 正常聚合效果 | 接受该情况,关注总成本 |
Verification Checklist
验证检查清单
- 3+ test cases (simple, medium, complex)
- Estimates run BEFORE API calls
- Variance formula:
(actual - estimated) / estimated - Target: ±20% for input AND output
- Report includes actionable recommendations
- 3个及以上测试用例(简单、中等、复杂)
- 估算在API调用前完成
- 使用偏差公式:
(actual - estimated) / estimated - 目标:输入和输出偏差均在±20%以内
- 报告包含可执行的改进建议
References
参考资料
See for detailed calibration tables and historical data.
/references/calibration-data.md详见 获取详细校准表格和历史数据。
/references/calibration-data.md