cache-audit

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Prompt Cache Audit

Prompt缓存审计

Trigger:
/cache-audit
or "audit my caching" or "check my cache setup"
What it does: Reads your live Claude Code configuration and measures it against prompt caching best practices. Returns a scored report with specific, actionable fixes ranked by token savings.
Background: The API caches the prefix of each request (system prompt, tool definitions, CLAUDE.md, rules, skill registry, MEMORY.md). An identical prefix between turns = ~90% cost reduction on those tokens. ANY change to the prefix invalidates everything after the change point.

触发条件:
/cache-audit
或“审计我的缓存”或“检查我的缓存配置”
功能说明: 读取你的实时Claude Code配置,并对照prompt缓存最佳实践进行评估。返回带有评分的报告,其中包含具体、可执行的修复方案,并按节省的token量排序。
背景知识: API会缓存每个请求的前缀部分(系统prompt、工具定义、CLAUDE.md、规则、技能注册表、MEMORY.md)。不同轮次间前缀完全一致时,可节省约90%的token成本。前缀的任何修改都会使修改点之后的所有缓存失效。

When Invoked

调用时的操作

Run ALL 8 checks automatically. Do NOT ask for confirmation. Read the relevant files, measure sizes, and produce the full report in one pass.
Use
$PROJECT
to refer to the current working directory throughout.

自动运行全部8项检查,无需确认。读取相关文件、测量大小,一次性生成完整报告。
全程使用
$PROJECT
指代当前工作目录。

The 8 Checks

8项检查内容

Check 1 — Prefix Ordering (Static Before Dynamic)

检查1 — 前缀排序(静态内容在前,动态内容在后)

Read:
~/.claude/CLAUDE.md
,
$PROJECT/CLAUDE.md
,
~/.claude/rules/*.md
,
$PROJECT/.claude/rules/*.md
, and the MEMORY.md file for the current project (find it under
~/.claude/projects/*/memory/MEMORY.md
— match by project path).
Flag any dynamic content in these files:
  • Timestamps,
    new Date()
    , hardcoded dates that go stale
  • Git refs, commit hashes, branch names
  • Session IDs, task IDs, "currently working on X"
  • File counts, line counts, or any computed metrics
  • currentDate
    entries in MEMORY.md
These files are part of the static prefix. Dynamic data here means cache misses on every turn where it changes.
Scoring:
  • PASS: All prefix files contain only static instructions and conventions
  • WARNING: Low-frequency dynamic data (e.g., a date updated daily)
  • FAIL: High-churn content (timestamps, computed values) in any prefix file

读取文件:
~/.claude/CLAUDE.md
$PROJECT/CLAUDE.md
~/.claude/rules/*.md
$PROJECT/.claude/rules/*.md
,以及当前项目的MEMORY.md文件(路径为
~/.claude/projects/*/memory/MEMORY.md
—— 按项目路径匹配)。
标记以下文件中的动态内容:
  • 时间戳、
    new Date()
    、会过期的硬编码日期
  • Git引用、提交哈希值、分支名称
  • 会话ID、任务ID、“当前正在处理X”这类内容
  • 文件数量、行数或任何计算得出的指标
  • MEMORY.md中的
    currentDate
    条目
这些文件属于静态前缀部分。此处的动态数据会导致每次内容变化时缓存失效。
评分标准:
  • 通过:所有前缀文件仅包含静态指令和约定
  • 警告:存在低频动态数据(例如每日更新的日期)
  • 失败:任何前缀文件中包含高变动内容(时间戳、计算值)

Check 2 — Hook Injection Pattern

检查2 — 钩子注入模式

Read:
~/.claude/settings.json
and
$PROJECT/.claude/settings.json
to find all hook commands. Then read each referenced hook file.
For each hook, verify:
  • Hooks that inject context MUST use
    additionalContext
    in their JSON output (this becomes a
    <system-reminder>
    message — part of the message history, NOT the prefix)
  • Hooks that only log/backup should produce no
    hookSpecificOutput
    at all
Specifically flag:
  • Any hook that opens and writes to CLAUDE.md, MEMORY.md, or rule files mid-session
  • Any hook that modifies tool definitions or the system prompt directly
  • Any hook that uses
    hookSpecificOutput
    keys other than
    additionalContext
Check each hook and report its pattern:
Hook EventExpected Pattern
SessionStart
additionalContext
with compact git context OR no output
UserPromptSubmitLogging only, no
additionalContext
PreCompactLogging/backup only, no context injection
All others (Stop, SessionEnd, Notification, etc.)No prefix modification
Scoring:
  • PASS: All hooks use
    additionalContext
    or no-inject patterns
  • FAIL: Any hook modifies prefix files (CLAUDE.md, rules, MEMORY.md) mid-session

读取文件:
~/.claude/settings.json
$PROJECT/.claude/settings.json
以找到所有钩子命令,然后读取每个引用的钩子文件。
对每个钩子进行验证:
  • 注入上下文的钩子必须在其JSON输出中使用
    additionalContext
    (这会成为
    <system-reminder>
    消息 —— 属于消息历史,而非前缀)
  • 仅用于日志/备份的钩子不应产生任何
    hookSpecificOutput
特别标记以下情况:
  • 任何在会话期间打开并写入CLAUDE.md、MEMORY.md或规则文件的钩子
  • 任何直接修改工具定义或系统prompt的钩子
  • 任何使用
    additionalContext
    以外的
    hookSpecificOutput
    键的钩子
检查每个钩子并报告其模式:
钩子事件预期模式
SessionStart带有精简Git上下文的
additionalContext
,或无输出
UserPromptSubmit仅记录日志,无
additionalContext
PreCompact仅记录/备份,无上下文注入
其他所有事件(Stop、SessionEnd、Notification等)不修改前缀
评分标准:
  • 通过:所有钩子均使用
    additionalContext
    或无注入模式
  • 失败:任何钩子在会话期间修改前缀文件(CLAUDE.md、规则、MEMORY.md)

Check 3 — Tool Stability

检查3 — 工具稳定性

Read:
~/.claude.json
for global MCP servers,
$PROJECT/.mcp.json
for project MCP servers (if exists).
Measure and report:
  • Total MCP server count (global + project-level)
  • Each server name and whether it's deduplicated across levels
Flag:
  • Same MCP server name at both global and project level (tool schema loaded twice?)
  • Any skill that explicitly adds or removes tools when invoked
  • 8 total MCP servers (each adds tool schema tokens to the prefix)
Note: MCP tools use deferred loading via
ToolSearch
by default — this is the correct pattern. Stubs are lightweight; full schemas load on demand.
Scoring:
  • PASS: Fixed tool set at session start, no conditional loading
  • WARNING: > 8 MCP servers (consider if all are needed per-project)
  • FAIL: Dynamic tool add/remove detected mid-conversation

读取文件:
~/.claude.json
获取全局MCP服务器,
$PROJECT/.mcp.json
获取项目级MCP服务器(如果存在)。
测量并报告:
  • MCP服务器总数(全局+项目级)
  • 每个服务器的名称,以及是否在不同层级重复
标记以下情况:
  • 全局和项目级存在相同名称的MCP服务器(工具架构是否被加载两次?)
  • 任何在调用时显式添加或移除工具的技能
  • MCP服务器总数超过8个(每个都会向前缀添加工具架构token)
注意: MCP工具默认通过
ToolSearch
实现延迟加载 —— 这是正确的模式。存根文件占用资源少;完整架构会按需加载。
评分标准:
  • 通过:会话开始时工具集固定,无条件加载
  • 警告:MCP服务器数量超过8个(考虑是否所有服务器都是项目必需的)
  • 失败:检测到对话期间动态添加/移除工具

Check 4 — Model Consistency

检查4 — 模型一致性

Read:
~/.claude/settings.json
for
model
or
alwaysThinkingEnabled
fields.
Check:
  • Is there a stable model configuration? (Default model is fine if consistent)
  • Do any agent definitions (
    .claude/agents/*.md
    ) specify different
    model:
    in frontmatter for inline use?
  • Subagent model delegation (Task tool with
    model:
    parameter) is FINE — separate conversations don't break parent cache
Scoring:
  • PASS: Consistent model per conversation, subagents handle model switching
  • FAIL: Evidence of inline model switching in same conversation thread

读取文件:
~/.claude/settings.json
中的
model
alwaysThinkingEnabled
字段。
检查内容:
  • 是否有稳定的模型配置?(默认模型只要一致即可)
  • 任何代理定义文件(
    .claude/agents/*.md
    )是否在前置元数据中指定了不同的
    model:
    用于内联调用?
  • 子代理模型委托(带有
    model:
    参数的Task工具)是允许的 —— 独立对话不会破坏父级缓存
评分标准:
  • 通过:每个会话使用一致的模型,子代理处理模型切换
  • 失败:同一对话线程中存在内联模型切换的证据

Check 5 — Dynamic Content Size

检查5 — 动态内容大小

Measure actual injection sizes. For each source, read the hook code and estimate output:
SourceHow to MeasurePASSWARNINGFAIL
SessionStart hookRead code — estimate
additionalContext
output chars
< 200200–2K> 2K
UserPromptSubmit hookRead code — does it emit
additionalContext
?
No output< 500> 500
Built-in git statusRun
git status --porcelain | wc -c
< 2K2–10K> 10K
Use ~4 chars per token as the conversion estimate.
Also report:
  • Total hook count across all events (each hook = execution latency per trigger)
  • Any hook with timeout > 10 seconds
Overall scoring:
  • PASS: All per-turn injections total < 2K chars
  • WARNING: 2–10K chars per turn
  • FAIL: > 10K chars injected per turn into the main conversation

测量实际注入大小。对于每个来源,读取钩子代码并估算输出:
来源测量方式通过警告失败
SessionStart钩子读取代码 —— 估算
additionalContext
输出字符数
< 200200–2K> 2K
UserPromptSubmit钩子读取代码 —— 是否输出
additionalContext
无输出< 500> 500
内置Git状态运行
git status --porcelain | wc -c
< 2K2–10K> 10K
使用约4字符=1token的转换比例进行估算。
同时报告:
  • 所有事件的钩子总数(每个钩子触发时都会产生执行延迟)
  • 任何超时超过10秒的钩子
整体评分标准:
  • 通过:每轮注入的总字符数 < 2K
  • 警告:每轮注入2–10K字符
  • 失败:每轮注入到主对话的字符数 > 10K

Check 6 — Fork Safety (Compaction & Subagents)

检查6 — 分支安全性(压缩与子代理)

Read: PreCompact hook code.
Verify:
  • PreCompact hook does NOT modify the prefix (logging/backup only is correct)
  • No custom compaction logic that rebuilds the system prompt differently
  • Claude Code's built-in compaction preserves system prompt + tools by default
Scoring:
  • PASS: Using built-in compaction +
    additionalContext
    -only hook injection
  • FAIL: Any hook modifies prefix during compaction or subagent spawn

读取文件: PreCompact钩子代码。
验证内容:
  • PreCompact钩子不得修改前缀(仅记录/备份是正确的)
  • 不存在会以不同方式重建系统prompt的自定义压缩逻辑
  • Claude Code的内置压缩默认保留系统prompt和工具
评分标准:
  • 通过:使用内置压缩 + 仅
    additionalContext
    的钩子注入模式
  • 失败:任何钩子在压缩或子代理生成时修改前缀

Check 7 — Static Prefix Budget

检查7 — 静态前缀预算

This is the most actionable check. Measure every component of the static prefix.
Read and measure (report in chars AND estimated tokens at ~4 chars/token):
ComponentHow to Find
CLAUDE.md (global)
~/.claude/CLAUDE.md
CLAUDE.md (project)
$PROJECT/CLAUDE.md
Rules (global)Each file in
~/.claude/rules/*.md
Rules (project)Each file in
$PROJECT/.claude/rules/*.md
MEMORY.mdMatch current project under
~/.claude/projects/*/memory/MEMORY.md
Use
wc -c
via Bash to measure file sizes. Measure EACH file individually.
Calculate:
  1. Grand total chars across all measured files
  2. Estimated tokens (chars / 4)
  3. Percentage of 200K context window consumed by static prefix
Report the top 5 largest individual files.
Scoring:
  • PASS: Total static prefix < 60K chars (~15K tokens, ~7.5% of context)
  • WARNING: 60–120K chars (~15–30K tokens, 7.5–15% of context)
  • FAIL: > 120K chars (~30K tokens, > 15% of context)

这是最具可操作性的检查项。 测量静态前缀的每个组件。
读取并测量(同时报告字符数和估算token数,按~4字符/token计算):
组件查找路径
CLAUDE.md(全局)
~/.claude/CLAUDE.md
CLAUDE.md(项目级)
$PROJECT/CLAUDE.md
规则(全局)
~/.claude/rules/*.md
中的每个文件
规则(项目级)
$PROJECT/.claude/rules/*.md
中的每个文件
MEMORY.md
~/.claude/projects/*/memory/MEMORY.md
中匹配当前项目
使用Bash的
wc -c
命令测量文件大小,单独测量每个文件。
计算:
  1. 所有测量文件的总字符数
  2. 估算token数(字符数 / 4)
  3. 静态前缀占用200K上下文窗口的百分比
报告前5个最大的单个文件。
评分标准:
  • 通过:静态前缀总字符数 < 60K(约15K token,占上下文的~7.5%)
  • 警告:60–120K字符(约15–30K token,占上下文的7.5–15%)
  • 失败:> 120K字符(约30K token,占上下文的>15%)

Check 8 — Rule Layer Efficiency

检查8 — 规则层效率

Read: List filenames in
~/.claude/rules/
and
$PROJECT/.claude/rules/
.
Key fact: Rules at both levels are additive — Claude Code loads ALL of them. This means duplicate filenames = duplicate content = wasted tokens.
Check for:
  1. Any filename that exists at BOTH
    ~/.claude/rules/
    and
    $PROJECT/.claude/rules/
    — these load twice
  2. For each duplicate, read both versions and estimate content overlap
  3. Whether the project uses a single
    project-implementation.md
    for overrides (correct pattern) vs. many files that duplicate user-level rules
The correct pattern:
  • User-level (
    ~/.claude/rules/
    ): Generic patterns — the WHAT (applies to all projects)
  • Project-level (
    $PROJECT/.claude/rules/
    ): Single
    project-implementation.md
    — the HOW (framework-specific overrides)
Scoring:
  • PASS: No duplicate filenames, project uses
    project-implementation.md
    only
  • WARNING: 1–3 duplicate files
  • FAIL: > 3 duplicate files — significant token waste from additive loading

读取文件:
~/.claude/rules/
$PROJECT/.claude/rules/
中的文件名列表。
关键事实: 两个层级的规则是叠加的 —— Claude Code会加载所有规则。这意味着重复的文件名会导致重复内容,浪费token。
检查内容:
  1. 任何同时存在于
    ~/.claude/rules/
    $PROJECT/.claude/rules/
    中的文件名 —— 这些文件会被加载两次
  2. 对于每个重复文件,读取两个版本并估算内容重叠度
  3. 项目是否使用单个
    project-implementation.md
    进行覆盖(正确模式),而非多个重复用户级规则的文件
正确模式:
  • 用户级
    ~/.claude/rules/
    ):通用模式 —— 定义“做什么”(适用于所有项目)
  • 项目级
    $PROJECT/.claude/rules/
    ):单个
    project-implementation.md
    —— 定义“怎么做”(框架特定的覆盖规则)
评分标准:
  • 通过:无重复文件名,项目仅使用
    project-implementation.md
  • 警告:存在1–3个重复文件
  • 失败:>3个重复文件 —— 叠加加载导致大量token浪费

Output Format

输出格式

After running all 8 checks, output this exact report format:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  PROMPT CACHE AUDIT
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Score: X/8

[✅/⚠️/❌]  Check 1 — Prefix Ordering: [PASS/WARNING/FAIL]
   → [finding]

[✅/⚠️/❌]  Check 2 — Hook Injection: [PASS/WARNING/FAIL]
   → [each hook and its pattern]

[✅/⚠️/❌]  Check 3 — Tool Stability: [PASS/WARNING/FAIL]
   → [N global + N project MCP servers, any issues]

[✅/⚠️/❌]  Check 4 — Model Consistency: [PASS/WARNING/FAIL]
   → [model config]

[✅/⚠️/❌]  Check 5 — Dynamic Content: [PASS/WARNING/FAIL]
   → [size breakdown per injection point]

[✅/⚠️/❌]  Check 6 — Fork Safety: [PASS/WARNING/FAIL]
   → [compaction + subagent pattern]

[✅/⚠️/❌]  Check 7 — Prefix Budget: [PASS/WARNING/FAIL]
   → Total: XX,XXX chars (~X,XXX tokens, X.X% of 200K)
   → Top 5 largest:
     1. filename — X,XXX chars (~X,XXX tokens)
     2. filename — X,XXX chars (~X,XXX tokens)
     3. ...

[✅/⚠️/❌]  Check 8 — Rule Efficiency: [PASS/WARNING/FAIL]
   → [duplicate count + wasted tokens]

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  TOKEN BUDGET SUMMARY
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Static prefix:          ~XX,XXX tokens (X.X% of 200K window)
Per-turn injection:     ~XXX tokens
Per-builder spawn:      ~X,XXX tokens
Per-lightweight spawn:  ~XX tokens

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  TOP FIXES (ranked by token savings)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

1. [Most impactful fix — exact steps]
2. [Second most impactful — exact steps]
3. [Third — if applicable]
If all checks pass: confirm the setup is well-optimised and estimate cost savings vs a naive configuration (no caching awareness).

完成所有8项检查后,输出以下固定格式的报告:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  PROMPT缓存审计报告
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

评分:X/8

[✅/⚠️/❌]  检查1 — 前缀排序:[通过/警告/失败]
   → [检查结果]

[✅/⚠️/❌]  检查2 — 钩子注入:[通过/警告/失败]
   → [每个钩子及其模式]

[✅/⚠️/❌]  检查3 — 工具稳定性:[通过/警告/失败]
   → [N个全局 + N个项目级MCP服务器,存在的问题]

[✅/⚠️/❌]  检查4 — 模型一致性:[通过/警告/失败]
   → [模型配置情况]

[✅/⚠️/❌]  检查5 — 动态内容:[通过/警告/失败]
   → [每个注入点的大小明细]

[✅/⚠️/❌]  检查6 — 分支安全性:[通过/警告/失败]
   → [压缩 + 子代理模式情况]

[✅/⚠️/❌]  检查7 — 前缀预算:[通过/警告/失败]
   → 总大小:XX,XXX字符(约X,XXX token,占200K窗口的X.X%)
   → 前5个最大文件:
     1. 文件名 —— X,XXX字符(约X,XXX token)
     2. 文件名 —— X,XXX字符(约X,XXX token)
     3. ...

[✅/⚠️/❌]  检查8 — 规则效率:[通过/警告/失败]
   → [重复文件数量 + 浪费的token量]

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  TOKEN预算汇总
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

静态前缀:          ~XX,XXX token(占200K窗口的X.X%)
每轮注入:     ~XXX token
每个构建器启动:      ~X,XXX token
每个轻量级启动:  ~XX token

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  优先修复方案(按节省token量排序)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

1. [影响最大的修复方案 —— 具体步骤]
2. [影响第二大的修复方案 —— 具体步骤]
3. [第三项(如有需要)]
如果所有检查都通过:确认配置已优化良好,并估算与无缓存意识的 naive 配置相比的成本节省情况。

Prompt Caching Cheatsheet

Prompt缓存速查表

RuleDoDon't
OrderingStatic CLAUDE.md + rules, dynamic in messagesTimestamps/dates/git refs in prefix files
Updates
additionalContext
<system-reminder>
Edit CLAUDE.md or rules mid-session
ToolsFixed tool set + deferred MCP stubsAdd/remove tools per turn
ModelsOne model per conversation, subagents for switchesInline model switching
SizeTrim injections to minimum neededDump full git status (40K+ chars)
ForksBuilt-in compaction,
additionalContext
only
Custom prefix rebuilds
BudgetStatic prefix < 15K tokensBloated CLAUDE.md, massive rule files
LayersUser-level generic + project-level
project-implementation.md
Same rule files at both levels
规则应该做不应该做
排序静态CLAUDE.md + 规则在前,动态内容在消息中前缀文件中包含时间戳/日期/Git引用
更新使用
additionalContext
<system-reminder>
会话期间编辑CLAUDE.md或规则
工具固定工具集 + 延迟加载MCP存根每轮添加/移除工具
模型每个会话使用一个模型,子代理处理切换内联模型切换
大小将注入内容精简到最小必要量输出完整Git状态(40K+字符)
分支使用内置压缩,仅用
additionalContext
自定义前缀重建
预算静态前缀 <15K token臃肿的CLAUDE.md、超大规则文件
层级用户级通用规则 + 项目级
project-implementation.md
两个层级使用相同的规则文件