cc-canary

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

cc-canary — long-form regression writeup

cc-canary — 长篇退化分析报告

Primary question: has Claude regressed on this user's work, and when?

Bundled script

scripts/compute_stats.py

does ~95% of the work in ~2.5s: scans JSONLs, runs inflection + transition-day detection, builds pre/post aggregates, cross-version comparison, hour-of-day, word frequency, three- period thinking depth, visibility transition, per-turn rates, and abnormalities — then renders a complete markdown skeleton with every table filled. Narrative slots are marked

<!-- C: ... -->

Default window:

60d

. Accept

7d / 14d / 30d / 60d / 90d / 180d

核心问题：Claude在用户的工作场景中是否出现性能退化？退化发生在何时？

配套脚本

scripts/compute_stats.py

在约2.5秒内完成约95%的工作：扫描JSONLs文件，执行拐点检测与过渡日识别，构建退化前后的聚合数据、跨版本对比、时段分析、词频统计、三阶段思考深度、可见性转换、每轮交互率及异常检测——随后生成一个已填好所有表格的完整Markdown框架。叙事内容占位符标记为

<!-- C: ... -->

。

默认分析窗口：

60d

。支持

7d / 14d / 30d / 60d / 90d / 180d

。

Framing — three-bucket classification

分析框架——三类归因分类

Model-side — same user/task, worse outcomes; cross-version worse; reasoning-depth dropping
User-side — project-mix shift; shorter/imperative prompts; new codebase; shortcut vocab rising
Ambiguous — mixed confounds, borderline effect size, either-way-explainable

模型端 — 相同用户/任务下结果变差；跨版本对比性能下降；推理深度降低
用户端 — 项目组合变化；提示语更简短/命令式；使用新代码库；快捷词汇使用增多
模糊归因 — 混杂多种干扰因素，影响程度临界，两种归因均可解释

Your 3-step job

三步工作流程

1. Run the script

1. 运行脚本

Bash(python3 <SKILL_DIR>/scripts/compute_stats.py --window {window} --render-md /tmp/cc-canary-skeleton-{window}.md > /dev/null 2>&1)

<SKILL_DIR>

.claude/skills/cc-canary/

→ fallback to

~/.claude/skills/cc-canary/

Flags:

--window {Nd}

(required);

--include-agents

(include subagent sessions);

--min-user-words N

(default 10).

If the script fails: report error, retry once with

--include-agents

, else stop. Never fall back to hand-computation — that's the slow path.

Bash(python3 <SKILL_DIR>/scripts/compute_stats.py --window {window} --render-md /tmp/cc-canary-skeleton-{window}.md > /dev/null 2>&1)

<SKILL_DIR>

.claude/skills/cc-canary/

→ 备用路径为

~/.claude/skills/cc-canary/

参数：

--window {Nd}

（必填）；

--include-agents

（包含子Agent会话）；

--min-user-words N

（默认值为10）。

若脚本运行失败：报告错误，使用

--include-agents

参数重试一次，否则终止。绝不手动计算——这是低效路径。

2. Read the skeleton

2. 读取框架文件

Read /tmp/cc-canary-skeleton-{window}.md

Read /tmp/cc-canary-skeleton-{window}.md

3. Fill every

<!-- C: ... -->

placeholder and save

3. 填充所有

<!-- C: ... -->

占位符并保存

Write ./cc-canary-{YYYY-MM-DD}.md

End your message with the absolute path:

Wrote /Users/.../cc-canary-{date}.md · paste-ready.

Write ./cc-canary-{YYYY-MM-DD}.md

消息末尾需附上绝对路径：

Wrote /Users/.../cc-canary-{date}.md · paste-ready.

Narrative placeholders

叙事占位符说明

Each placeholder's inline comment already spells out what to write. Summary:

```
verdict-line
```
— HOLDING / SUSPECTED REGRESSION / CONFIRMED REGRESSION / INCONCLUSIVE + brief justification
```
summary
```
— 2–4 sentences: verdict + inflection + biggest pre→post delta. No counter-evidence hedging.
```
timeline
```
— 1–2 paragraphs on the daily series shape
```
xv-para
```
— 1 paragraph on cross-version (if §3 present)
```
finding-N-class
```
× up to 5 — inline classification label
```
finding-N-reason
```
× up to 5 — 2–3 sentences max, evidence-first. Lead with the strongest number (cross-version Δ, §2 value, appendix rate). No signal-line restating, no rhetorical buildup
```
root-cause
```
— 3–5 paragraphs tying strongest signals together
```
what-would-help
```
— 2–4 concrete bullets
```
appendix-a1…a4, b, c, d, e, f, g, h
```
— 1 paragraph each (see table context in skeleton)
```
meta-note
```
— 2–5 sentences, first person, honest, no claimed feelings, acknowledge the recursion

每个占位符的内嵌注释已明确说明需要填写的内容。摘要如下：

```
verdict-line
```
— HOLDING（暂不判定）/ SUSPECTED REGRESSION（疑似退化）/ CONFIRMED REGRESSION（确认退化）/ INCONCLUSIVE（结论不明确） + 简短理由
```
summary
```
— 2-4句话：结论+拐点信息+退化前后最大差异值。无需对冲反证。
```
timeline
```
— 1-2段文字描述每日数据序列的形态
```
xv-para
```
— 1段文字分析跨版本对比情况（若第3节存在）
```
finding-N-class
```
× 最多5个 — 内联分类标签
```
finding-N-reason
```
× 最多5个 — 最多2-3句话，以证据为先。开头使用最有力的数据（跨版本差异值Δ、第2节数值、附录比率）。无需重复信号线内容，无需修辞铺垫
```
root-cause
```
— 3-5段文字，将最显著的信号关联起来分析根本原因
```
what-would-help
```
— 2-4条具体的改进建议（项目符号列表）
```
appendix-a1…a4, b, c, d, e, f, g, h
```
— 每个附录对应1段文字（参考框架中的表格上下文）
```
meta-note
```
— 2-5句话，第一人称表述，客观诚实，不表达主观感受，需承认递归特性

Verdict calibration

结论校准规则

HOLDING: ≤1 model-side signal
SUSPECTED REGRESSION: 2–3 model-side signals
CONFIRMED REGRESSION: ≥3 model-side signals + non-empty cross-version showing decline +
```
session_count ≥ 15
```
+ ≥2 models +
```
inflection.gap_sigma ≥ 1.0
```

INCONCLUSIVE:

session_count < 15

inflection.method == "fallback_split_half"

with overlapping confounds

Cap at SUSPECTED when: only one model; <15 sessions; single-project with project starting mid-window; inflection coincides with a visible user-side event.

All the data you need (session_count, model mix, inflection method, cross-version presence) is rendered as plain text in the skeleton.

暂不判定：≤1个模型端信号
疑似退化：2-3个模型端信号
确认退化：≥3个模型端信号 + 非空的跨版本对比显示性能下降 +
```
session_count ≥ 15
```
+ ≥2个模型版本 +
```
inflection.gap_sigma ≥ 1.0
```

结论不明确：

session_count < 15

或

inflection.method == "fallback_split_half"

且存在重叠干扰因素

以下情况结论上限为“疑似退化”：仅使用一个模型；会话数<15；单一项目且项目在分析窗口中途启动；拐点与明显的用户端事件重合。

所需的所有数据（会话数、模型组合、拐点检测方法、跨版本对比是否存在）均已在框架中以纯文本形式呈现。

Grounding example (for

finding-N-reason

)

参考示例（针对

finding-N-reason

）

Classification: model-side. Read:Edit dropped 9.0 → 1.0 (-89%, concerning) while cross-version shows opus-4-7 at 0.39 vs opus-4-6 at 1.00 on the same user's workload. No project-mix shift near the inflection — model is defaulting to edit-first.

分类：模型端。读/编辑比率从9.0降至1.0（下降89%，值得关注），同时跨版本对比显示，同一用户工作负载下opus-4-7版本为0.39，而opus-4-6版本为1.00。拐点附近无项目组合变化——模型默认优先选择编辑模式。

Hard rules

硬性规则

Never read, grep, or glob
```
~/.claude/projects/**/*.jsonl
```
. Never run
```
jq
```
/
```
awk
```
/
```
wc
```
on session files. Script owns all that.
Never touch tables or numbers — they came from real data.
Every finding gets a classification label.
Hedge when cross-version is empty or
```
session_count < 15
```
.
Do not verdict CONFIRMED REGRESSION without the full checklist.
Do not save the skeleton as-is — replace every
```

```
first.

禁止读取、grep或遍历
```
~/.claude/projects/**/*.jsonl
```
文件。禁止在会话文件上运行
```
jq
```
/
```
awk
```
/
```
wc
```
命令。所有这些操作均由脚本完成。
禁止修改表格或数值——这些均来自真实数据。
每个结论都必须添加分类标签。
当跨版本对比为空或
```
session_count < 15
```
时，需谨慎表述。
未满足所有 checklist 条件时，不得判定为“确认退化”。
禁止直接保存框架文件——必须先替换所有
```

```
占位符。

Failure modes

故障处理

Script import error → check
```
python3 -V
```
≥ 3.8; retry once with
```
--include-agents
```
; else stop.
Skeleton < 5KB → likely no sessions in window. Check script error.

inflection.method == fallback_split_half

→ state it; cap at SUSPECTED.

Cross-version Δ
```
None
```
→ div-by-zero when model-A value is 0; note the confound.

脚本导入错误 → 检查
```
python3 -V
```
是否≥3.8；使用
```
--include-agents
```
参数重试一次；否则终止。
框架文件大小<5KB → 分析窗口内可能无会话数据。检查脚本错误信息。
```
inflection.method == fallback_split_half
```
→ 需注明此情况；结论上限为“疑似退化”。
跨版本差异值Δ为
```
None
```
→ 当模型A的数值为0时出现除零错误；需注明此干扰因素。

cc-canary

Original

Translation

cc-canary — long-form regression writeup

cc-canary — 长篇退化分析报告

Framing — three-bucket classification

分析框架——三类归因分类

Your 3-step job

三步工作流程

1. Run the script

1. 运行脚本

2. Read the skeleton

2. 读取框架文件

3. Fill every
``
placeholder and save

3. 填充所有
``
占位符并保存

Narrative placeholders

叙事占位符说明

Verdict calibration

结论校准规则

Grounding example (for
`finding-N-reason`
)

参考示例（针对
`finding-N-reason`
）

Hard rules

硬性规则

Failure modes

故障处理

cc-canary

Original

Translation

cc-canary — long-form regression writeup

cc-canary — 长篇退化分析报告

Framing — three-bucket classification

分析框架——三类归因分类

Your 3-step job

三步工作流程

1. Run the script

1. 运行脚本

2. Read the skeleton

2. 读取框架文件

3. Fill every  placeholder and save

3. 填充所有占位符并保存

Narrative placeholders

叙事占位符说明

Verdict calibration

结论校准规则

Grounding example (for finding-N-reason)

参考示例（针对finding-N-reason）

Hard rules

硬性规则

Failure modes

故障处理

3. Fill every
``
placeholder and save

3. 填充所有
``
占位符并保存

Grounding example (for
`finding-N-reason`
)

参考示例（针对
`finding-N-reason`
）