cache-audit
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePrompt Cache Audit
Prompt缓存审计
Trigger: or "audit my caching" or "check my cache setup"
/cache-auditWhat it does: Reads your live Claude Code configuration and measures it against prompt caching best practices. Returns a scored report with specific, actionable fixes ranked by token savings.
Background: The API caches the prefix of each request (system prompt, tool definitions, CLAUDE.md, rules, skill registry, MEMORY.md). An identical prefix between turns = ~90% cost reduction on those tokens. ANY change to the prefix invalidates everything after the change point.
触发条件: 或“审计我的缓存”或“检查我的缓存配置”
/cache-audit功能说明: 读取你的实时Claude Code配置,并对照prompt缓存最佳实践进行评估。返回带有评分的报告,其中包含具体、可执行的修复方案,并按节省的token量排序。
背景知识: API会缓存每个请求的前缀部分(系统prompt、工具定义、CLAUDE.md、规则、技能注册表、MEMORY.md)。不同轮次间前缀完全一致时,可节省约90%的token成本。前缀的任何修改都会使修改点之后的所有缓存失效。
When Invoked
调用时的操作
Run ALL 8 checks automatically. Do NOT ask for confirmation. Read the relevant files, measure sizes, and produce the full report in one pass.
Use to refer to the current working directory throughout.
$PROJECT自动运行全部8项检查,无需确认。读取相关文件、测量大小,一次性生成完整报告。
全程使用指代当前工作目录。
$PROJECTThe 8 Checks
8项检查内容
Check 1 — Prefix Ordering (Static Before Dynamic)
检查1 — 前缀排序(静态内容在前,动态内容在后)
Read: , , , , and the MEMORY.md file for the current project (find it under — match by project path).
~/.claude/CLAUDE.md$PROJECT/CLAUDE.md~/.claude/rules/*.md$PROJECT/.claude/rules/*.md~/.claude/projects/*/memory/MEMORY.mdFlag any dynamic content in these files:
- Timestamps, , hardcoded dates that go stale
new Date() - Git refs, commit hashes, branch names
- Session IDs, task IDs, "currently working on X"
- File counts, line counts, or any computed metrics
- entries in MEMORY.md
currentDate
These files are part of the static prefix. Dynamic data here means cache misses on every turn where it changes.
Scoring:
- PASS: All prefix files contain only static instructions and conventions
- WARNING: Low-frequency dynamic data (e.g., a date updated daily)
- FAIL: High-churn content (timestamps, computed values) in any prefix file
读取文件: 、、、,以及当前项目的MEMORY.md文件(路径为 —— 按项目路径匹配)。
~/.claude/CLAUDE.md$PROJECT/CLAUDE.md~/.claude/rules/*.md$PROJECT/.claude/rules/*.md~/.claude/projects/*/memory/MEMORY.md标记以下文件中的动态内容:
- 时间戳、、会过期的硬编码日期
new Date() - Git引用、提交哈希值、分支名称
- 会话ID、任务ID、“当前正在处理X”这类内容
- 文件数量、行数或任何计算得出的指标
- MEMORY.md中的条目
currentDate
这些文件属于静态前缀部分。此处的动态数据会导致每次内容变化时缓存失效。
评分标准:
- 通过:所有前缀文件仅包含静态指令和约定
- 警告:存在低频动态数据(例如每日更新的日期)
- 失败:任何前缀文件中包含高变动内容(时间戳、计算值)
Check 2 — Hook Injection Pattern
检查2 — 钩子注入模式
Read: and to find all hook commands. Then read each referenced hook file.
~/.claude/settings.json$PROJECT/.claude/settings.jsonFor each hook, verify:
- Hooks that inject context MUST use in their JSON output (this becomes a
additionalContextmessage — part of the message history, NOT the prefix)<system-reminder> - Hooks that only log/backup should produce no at all
hookSpecificOutput
Specifically flag:
- Any hook that opens and writes to CLAUDE.md, MEMORY.md, or rule files mid-session
- Any hook that modifies tool definitions or the system prompt directly
- Any hook that uses keys other than
hookSpecificOutputadditionalContext
Check each hook and report its pattern:
| Hook Event | Expected Pattern |
|---|---|
| SessionStart | |
| UserPromptSubmit | Logging only, no |
| PreCompact | Logging/backup only, no context injection |
| All others (Stop, SessionEnd, Notification, etc.) | No prefix modification |
Scoring:
- PASS: All hooks use or no-inject patterns
additionalContext - FAIL: Any hook modifies prefix files (CLAUDE.md, rules, MEMORY.md) mid-session
读取文件: 和以找到所有钩子命令,然后读取每个引用的钩子文件。
~/.claude/settings.json$PROJECT/.claude/settings.json对每个钩子进行验证:
- 注入上下文的钩子必须在其JSON输出中使用(这会成为
additionalContext消息 —— 属于消息历史,而非前缀)<system-reminder> - 仅用于日志/备份的钩子不应产生任何
hookSpecificOutput
特别标记以下情况:
- 任何在会话期间打开并写入CLAUDE.md、MEMORY.md或规则文件的钩子
- 任何直接修改工具定义或系统prompt的钩子
- 任何使用以外的
additionalContext键的钩子hookSpecificOutput
检查每个钩子并报告其模式:
| 钩子事件 | 预期模式 |
|---|---|
| SessionStart | 带有精简Git上下文的 |
| UserPromptSubmit | 仅记录日志,无 |
| PreCompact | 仅记录/备份,无上下文注入 |
| 其他所有事件(Stop、SessionEnd、Notification等) | 不修改前缀 |
评分标准:
- 通过:所有钩子均使用或无注入模式
additionalContext - 失败:任何钩子在会话期间修改前缀文件(CLAUDE.md、规则、MEMORY.md)
Check 3 — Tool Stability
检查3 — 工具稳定性
Read: for global MCP servers, for project MCP servers (if exists).
~/.claude.json$PROJECT/.mcp.jsonMeasure and report:
- Total MCP server count (global + project-level)
- Each server name and whether it's deduplicated across levels
Flag:
- Same MCP server name at both global and project level (tool schema loaded twice?)
- Any skill that explicitly adds or removes tools when invoked
-
8 total MCP servers (each adds tool schema tokens to the prefix)
Note: MCP tools use deferred loading via by default — this is the correct pattern. Stubs are lightweight; full schemas load on demand.
ToolSearchScoring:
- PASS: Fixed tool set at session start, no conditional loading
- WARNING: > 8 MCP servers (consider if all are needed per-project)
- FAIL: Dynamic tool add/remove detected mid-conversation
读取文件: 获取全局MCP服务器,获取项目级MCP服务器(如果存在)。
~/.claude.json$PROJECT/.mcp.json测量并报告:
- MCP服务器总数(全局+项目级)
- 每个服务器的名称,以及是否在不同层级重复
标记以下情况:
- 全局和项目级存在相同名称的MCP服务器(工具架构是否被加载两次?)
- 任何在调用时显式添加或移除工具的技能
- MCP服务器总数超过8个(每个都会向前缀添加工具架构token)
注意: MCP工具默认通过实现延迟加载 —— 这是正确的模式。存根文件占用资源少;完整架构会按需加载。
ToolSearch评分标准:
- 通过:会话开始时工具集固定,无条件加载
- 警告:MCP服务器数量超过8个(考虑是否所有服务器都是项目必需的)
- 失败:检测到对话期间动态添加/移除工具
Check 4 — Model Consistency
检查4 — 模型一致性
Read: for or fields.
~/.claude/settings.jsonmodelalwaysThinkingEnabledCheck:
- Is there a stable model configuration? (Default model is fine if consistent)
- Do any agent definitions () specify different
.claude/agents/*.mdin frontmatter for inline use?model: - Subagent model delegation (Task tool with parameter) is FINE — separate conversations don't break parent cache
model:
Scoring:
- PASS: Consistent model per conversation, subagents handle model switching
- FAIL: Evidence of inline model switching in same conversation thread
读取文件: 中的或字段。
~/.claude/settings.jsonmodelalwaysThinkingEnabled检查内容:
- 是否有稳定的模型配置?(默认模型只要一致即可)
- 任何代理定义文件()是否在前置元数据中指定了不同的
.claude/agents/*.md用于内联调用?model: - 子代理模型委托(带有参数的Task工具)是允许的 —— 独立对话不会破坏父级缓存
model:
评分标准:
- 通过:每个会话使用一致的模型,子代理处理模型切换
- 失败:同一对话线程中存在内联模型切换的证据
Check 5 — Dynamic Content Size
检查5 — 动态内容大小
Measure actual injection sizes. For each source, read the hook code and estimate output:
| Source | How to Measure | PASS | WARNING | FAIL |
|---|---|---|---|---|
| SessionStart hook | Read code — estimate | < 200 | 200–2K | > 2K |
| UserPromptSubmit hook | Read code — does it emit | No output | < 500 | > 500 |
| Built-in git status | Run | < 2K | 2–10K | > 10K |
| Use ~4 chars per token as the conversion estimate. |
Also report:
- Total hook count across all events (each hook = execution latency per trigger)
- Any hook with timeout > 10 seconds
Overall scoring:
- PASS: All per-turn injections total < 2K chars
- WARNING: 2–10K chars per turn
- FAIL: > 10K chars injected per turn into the main conversation
测量实际注入大小。对于每个来源,读取钩子代码并估算输出:
| 来源 | 测量方式 | 通过 | 警告 | 失败 |
|---|---|---|---|---|
| SessionStart钩子 | 读取代码 —— 估算 | < 200 | 200–2K | > 2K |
| UserPromptSubmit钩子 | 读取代码 —— 是否输出 | 无输出 | < 500 | > 500 |
| 内置Git状态 | 运行 | < 2K | 2–10K | > 10K |
| 使用约4字符=1token的转换比例进行估算。 |
同时报告:
- 所有事件的钩子总数(每个钩子触发时都会产生执行延迟)
- 任何超时超过10秒的钩子
整体评分标准:
- 通过:每轮注入的总字符数 < 2K
- 警告:每轮注入2–10K字符
- 失败:每轮注入到主对话的字符数 > 10K
Check 6 — Fork Safety (Compaction & Subagents)
检查6 — 分支安全性(压缩与子代理)
Read: PreCompact hook code.
Verify:
- PreCompact hook does NOT modify the prefix (logging/backup only is correct)
- No custom compaction logic that rebuilds the system prompt differently
- Claude Code's built-in compaction preserves system prompt + tools by default
Scoring:
- PASS: Using built-in compaction + -only hook injection
additionalContext - FAIL: Any hook modifies prefix during compaction or subagent spawn
读取文件: PreCompact钩子代码。
验证内容:
- PreCompact钩子不得修改前缀(仅记录/备份是正确的)
- 不存在会以不同方式重建系统prompt的自定义压缩逻辑
- Claude Code的内置压缩默认保留系统prompt和工具
评分标准:
- 通过:使用内置压缩 + 仅的钩子注入模式
additionalContext - 失败:任何钩子在压缩或子代理生成时修改前缀
Check 7 — Static Prefix Budget
检查7 — 静态前缀预算
This is the most actionable check. Measure every component of the static prefix.
Read and measure (report in chars AND estimated tokens at ~4 chars/token):
| Component | How to Find |
|---|---|
| CLAUDE.md (global) | |
| CLAUDE.md (project) | |
| Rules (global) | Each file in |
| Rules (project) | Each file in |
| MEMORY.md | Match current project under |
Use via Bash to measure file sizes. Measure EACH file individually.
wc -cCalculate:
- Grand total chars across all measured files
- Estimated tokens (chars / 4)
- Percentage of 200K context window consumed by static prefix
Report the top 5 largest individual files.
Scoring:
- PASS: Total static prefix < 60K chars (~15K tokens, ~7.5% of context)
- WARNING: 60–120K chars (~15–30K tokens, 7.5–15% of context)
- FAIL: > 120K chars (~30K tokens, > 15% of context)
这是最具可操作性的检查项。 测量静态前缀的每个组件。
读取并测量(同时报告字符数和估算token数,按~4字符/token计算):
| 组件 | 查找路径 |
|---|---|
| CLAUDE.md(全局) | |
| CLAUDE.md(项目级) | |
| 规则(全局) | |
| 规则(项目级) | |
| MEMORY.md | 在 |
使用Bash的命令测量文件大小,单独测量每个文件。
wc -c计算:
- 所有测量文件的总字符数
- 估算token数(字符数 / 4)
- 静态前缀占用200K上下文窗口的百分比
报告前5个最大的单个文件。
评分标准:
- 通过:静态前缀总字符数 < 60K(约15K token,占上下文的~7.5%)
- 警告:60–120K字符(约15–30K token,占上下文的7.5–15%)
- 失败:> 120K字符(约30K token,占上下文的>15%)
Check 8 — Rule Layer Efficiency
检查8 — 规则层效率
Read: List filenames in and .
~/.claude/rules/$PROJECT/.claude/rules/Key fact: Rules at both levels are additive — Claude Code loads ALL of them. This means duplicate filenames = duplicate content = wasted tokens.
Check for:
- Any filename that exists at BOTH and
~/.claude/rules/— these load twice$PROJECT/.claude/rules/ - For each duplicate, read both versions and estimate content overlap
- Whether the project uses a single for overrides (correct pattern) vs. many files that duplicate user-level rules
project-implementation.md
The correct pattern:
- User-level (): Generic patterns — the WHAT (applies to all projects)
~/.claude/rules/ - Project-level (): Single
$PROJECT/.claude/rules/— the HOW (framework-specific overrides)project-implementation.md
Scoring:
- PASS: No duplicate filenames, project uses only
project-implementation.md - WARNING: 1–3 duplicate files
- FAIL: > 3 duplicate files — significant token waste from additive loading
读取文件: 和中的文件名列表。
~/.claude/rules/$PROJECT/.claude/rules/关键事实: 两个层级的规则是叠加的 —— Claude Code会加载所有规则。这意味着重复的文件名会导致重复内容,浪费token。
检查内容:
- 任何同时存在于和
~/.claude/rules/中的文件名 —— 这些文件会被加载两次$PROJECT/.claude/rules/ - 对于每个重复文件,读取两个版本并估算内容重叠度
- 项目是否使用单个进行覆盖(正确模式),而非多个重复用户级规则的文件
project-implementation.md
正确模式:
- 用户级():通用模式 —— 定义“做什么”(适用于所有项目)
~/.claude/rules/ - 项目级():单个
$PROJECT/.claude/rules/—— 定义“怎么做”(框架特定的覆盖规则)project-implementation.md
评分标准:
- 通过:无重复文件名,项目仅使用
project-implementation.md - 警告:存在1–3个重复文件
- 失败:>3个重复文件 —— 叠加加载导致大量token浪费
Output Format
输出格式
After running all 8 checks, output this exact report format:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
PROMPT CACHE AUDIT
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Score: X/8
[✅/⚠️/❌] Check 1 — Prefix Ordering: [PASS/WARNING/FAIL]
→ [finding]
[✅/⚠️/❌] Check 2 — Hook Injection: [PASS/WARNING/FAIL]
→ [each hook and its pattern]
[✅/⚠️/❌] Check 3 — Tool Stability: [PASS/WARNING/FAIL]
→ [N global + N project MCP servers, any issues]
[✅/⚠️/❌] Check 4 — Model Consistency: [PASS/WARNING/FAIL]
→ [model config]
[✅/⚠️/❌] Check 5 — Dynamic Content: [PASS/WARNING/FAIL]
→ [size breakdown per injection point]
[✅/⚠️/❌] Check 6 — Fork Safety: [PASS/WARNING/FAIL]
→ [compaction + subagent pattern]
[✅/⚠️/❌] Check 7 — Prefix Budget: [PASS/WARNING/FAIL]
→ Total: XX,XXX chars (~X,XXX tokens, X.X% of 200K)
→ Top 5 largest:
1. filename — X,XXX chars (~X,XXX tokens)
2. filename — X,XXX chars (~X,XXX tokens)
3. ...
[✅/⚠️/❌] Check 8 — Rule Efficiency: [PASS/WARNING/FAIL]
→ [duplicate count + wasted tokens]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
TOKEN BUDGET SUMMARY
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Static prefix: ~XX,XXX tokens (X.X% of 200K window)
Per-turn injection: ~XXX tokens
Per-builder spawn: ~X,XXX tokens
Per-lightweight spawn: ~XX tokens
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
TOP FIXES (ranked by token savings)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. [Most impactful fix — exact steps]
2. [Second most impactful — exact steps]
3. [Third — if applicable]If all checks pass: confirm the setup is well-optimised and estimate cost savings vs a naive configuration (no caching awareness).
完成所有8项检查后,输出以下固定格式的报告:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
PROMPT缓存审计报告
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
评分:X/8
[✅/⚠️/❌] 检查1 — 前缀排序:[通过/警告/失败]
→ [检查结果]
[✅/⚠️/❌] 检查2 — 钩子注入:[通过/警告/失败]
→ [每个钩子及其模式]
[✅/⚠️/❌] 检查3 — 工具稳定性:[通过/警告/失败]
→ [N个全局 + N个项目级MCP服务器,存在的问题]
[✅/⚠️/❌] 检查4 — 模型一致性:[通过/警告/失败]
→ [模型配置情况]
[✅/⚠️/❌] 检查5 — 动态内容:[通过/警告/失败]
→ [每个注入点的大小明细]
[✅/⚠️/❌] 检查6 — 分支安全性:[通过/警告/失败]
→ [压缩 + 子代理模式情况]
[✅/⚠️/❌] 检查7 — 前缀预算:[通过/警告/失败]
→ 总大小:XX,XXX字符(约X,XXX token,占200K窗口的X.X%)
→ 前5个最大文件:
1. 文件名 —— X,XXX字符(约X,XXX token)
2. 文件名 —— X,XXX字符(约X,XXX token)
3. ...
[✅/⚠️/❌] 检查8 — 规则效率:[通过/警告/失败]
→ [重复文件数量 + 浪费的token量]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
TOKEN预算汇总
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
静态前缀: ~XX,XXX token(占200K窗口的X.X%)
每轮注入: ~XXX token
每个构建器启动: ~X,XXX token
每个轻量级启动: ~XX token
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
优先修复方案(按节省token量排序)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1. [影响最大的修复方案 —— 具体步骤]
2. [影响第二大的修复方案 —— 具体步骤]
3. [第三项(如有需要)]如果所有检查都通过:确认配置已优化良好,并估算与无缓存意识的 naive 配置相比的成本节省情况。
Prompt Caching Cheatsheet
Prompt缓存速查表
| Rule | Do | Don't |
|---|---|---|
| Ordering | Static CLAUDE.md + rules, dynamic in messages | Timestamps/dates/git refs in prefix files |
| Updates | | Edit CLAUDE.md or rules mid-session |
| Tools | Fixed tool set + deferred MCP stubs | Add/remove tools per turn |
| Models | One model per conversation, subagents for switches | Inline model switching |
| Size | Trim injections to minimum needed | Dump full git status (40K+ chars) |
| Forks | Built-in compaction, | Custom prefix rebuilds |
| Budget | Static prefix < 15K tokens | Bloated CLAUDE.md, massive rule files |
| Layers | User-level generic + project-level | Same rule files at both levels |
| 规则 | 应该做 | 不应该做 |
|---|---|---|
| 排序 | 静态CLAUDE.md + 规则在前,动态内容在消息中 | 前缀文件中包含时间戳/日期/Git引用 |
| 更新 | 使用 | 会话期间编辑CLAUDE.md或规则 |
| 工具 | 固定工具集 + 延迟加载MCP存根 | 每轮添加/移除工具 |
| 模型 | 每个会话使用一个模型,子代理处理切换 | 内联模型切换 |
| 大小 | 将注入内容精简到最小必要量 | 输出完整Git状态(40K+字符) |
| 分支 | 使用内置压缩,仅用 | 自定义前缀重建 |
| 预算 | 静态前缀 <15K token | 臃肿的CLAUDE.md、超大规则文件 |
| 层级 | 用户级通用规则 + 项目级 | 两个层级使用相同的规则文件 |