cc-canary
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesecc-canary — long-form regression writeup
cc-canary — 长篇退化分析报告
Primary question: has Claude regressed on this user's work, and when?
Bundled script does ~95% of the work in ~2.5s:
scans JSONLs, runs inflection + transition-day detection, builds pre/post
aggregates, cross-version comparison, hour-of-day, word frequency, three-
period thinking depth, visibility transition, per-turn rates, and
abnormalities — then renders a complete markdown skeleton with every
table filled. Narrative slots are marked .
scripts/compute_stats.py<!-- C: ... -->Default window: . Accept .
60d7d / 14d / 30d / 60d / 90d / 180d核心问题:Claude在用户的工作场景中是否出现性能退化?退化发生在何时?
配套脚本在约2.5秒内完成约95%的工作:扫描JSONLs文件,执行拐点检测与过渡日识别,构建退化前后的聚合数据、跨版本对比、时段分析、词频统计、三阶段思考深度、可见性转换、每轮交互率及异常检测——随后生成一个已填好所有表格的完整Markdown框架。叙事内容占位符标记为。
scripts/compute_stats.py<!-- C: ... -->默认分析窗口:。支持。
60d7d / 14d / 30d / 60d / 90d / 180dFraming — three-bucket classification
分析框架——三类归因分类
- Model-side — same user/task, worse outcomes; cross-version worse; reasoning-depth dropping
- User-side — project-mix shift; shorter/imperative prompts; new codebase; shortcut vocab rising
- Ambiguous — mixed confounds, borderline effect size, either-way-explainable
- 模型端 — 相同用户/任务下结果变差;跨版本对比性能下降;推理深度降低
- 用户端 — 项目组合变化;提示语更简短/命令式;使用新代码库;快捷词汇使用增多
- 模糊归因 — 混杂多种干扰因素,影响程度临界,两种归因均可解释
Your 3-step job
三步工作流程
1. Run the script
1. 运行脚本
Bash(python3 <SKILL_DIR>/scripts/compute_stats.py --window {window} --render-md /tmp/cc-canary-skeleton-{window}.md > /dev/null 2>&1)<SKILL_DIR>.claude/skills/cc-canary/~/.claude/skills/cc-canary/Flags: (required); (include subagent sessions); (default 10).
--window {Nd}--include-agents--min-user-words NIf the script fails: report error, retry once with , else stop. Never fall back to hand-computation — that's the slow path.
--include-agentsBash(python3 <SKILL_DIR>/scripts/compute_stats.py --window {window} --render-md /tmp/cc-canary-skeleton-{window}.md > /dev/null 2>&1)<SKILL_DIR>.claude/skills/cc-canary/~/.claude/skills/cc-canary/参数:(必填);(包含子Agent会话);(默认值为10)。
--window {Nd}--include-agents--min-user-words N若脚本运行失败:报告错误,使用参数重试一次,否则终止。绝不手动计算——这是低效路径。
--include-agents2. Read the skeleton
2. 读取框架文件
Read /tmp/cc-canary-skeleton-{window}.mdRead /tmp/cc-canary-skeleton-{window}.md3. Fill every <!-- C: ... -->
placeholder and save
<!-- C: ... -->3. 填充所有<!-- C: ... -->
占位符并保存
<!-- C: ... -->Write ./cc-canary-{YYYY-MM-DD}.mdEnd your message with the absolute path:
Wrote /Users/.../cc-canary-{date}.md · paste-ready.Write ./cc-canary-{YYYY-MM-DD}.md消息末尾需附上绝对路径:
Wrote /Users/.../cc-canary-{date}.md · paste-ready.Narrative placeholders
叙事占位符说明
Each placeholder's inline comment already spells out what to write. Summary:
- — HOLDING / SUSPECTED REGRESSION / CONFIRMED REGRESSION / INCONCLUSIVE + brief justification
verdict-line - — 2–4 sentences: verdict + inflection + biggest pre→post delta. No counter-evidence hedging.
summary - — 1–2 paragraphs on the daily series shape
timeline - — 1 paragraph on cross-version (if §3 present)
xv-para - × up to 5 — inline classification label
finding-N-class - × up to 5 — 2–3 sentences max, evidence-first. Lead with the strongest number (cross-version Δ, §2 value, appendix rate). No signal-line restating, no rhetorical buildup
finding-N-reason - — 3–5 paragraphs tying strongest signals together
root-cause - — 2–4 concrete bullets
what-would-help - — 1 paragraph each (see table context in skeleton)
appendix-a1…a4, b, c, d, e, f, g, h - — 2–5 sentences, first person, honest, no claimed feelings, acknowledge the recursion
meta-note
每个占位符的内嵌注释已明确说明需要填写的内容。摘要如下:
- — HOLDING(暂不判定)/ SUSPECTED REGRESSION(疑似退化)/ CONFIRMED REGRESSION(确认退化)/ INCONCLUSIVE(结论不明确) + 简短理由
verdict-line - — 2-4句话:结论+拐点信息+退化前后最大差异值。无需对冲反证。
summary - — 1-2段文字描述每日数据序列的形态
timeline - — 1段文字分析跨版本对比情况(若第3节存在)
xv-para - × 最多5个 — 内联分类标签
finding-N-class - × 最多5个 — 最多2-3句话,以证据为先。开头使用最有力的数据(跨版本差异值Δ、第2节数值、附录比率)。无需重复信号线内容,无需修辞铺垫
finding-N-reason - — 3-5段文字,将最显著的信号关联起来分析根本原因
root-cause - — 2-4条具体的改进建议(项目符号列表)
what-would-help - — 每个附录对应1段文字(参考框架中的表格上下文)
appendix-a1…a4, b, c, d, e, f, g, h - — 2-5句话,第一人称表述,客观诚实,不表达主观感受,需承认递归特性
meta-note
Verdict calibration
结论校准规则
- HOLDING: ≤1 model-side signal
- SUSPECTED REGRESSION: 2–3 model-side signals
- CONFIRMED REGRESSION: ≥3 model-side signals + non-empty cross-version showing decline + + ≥2 models +
session_count ≥ 15inflection.gap_sigma ≥ 1.0 - INCONCLUSIVE: OR
session_count < 15with overlapping confoundsinflection.method == "fallback_split_half"
Cap at SUSPECTED when: only one model; <15 sessions; single-project with project starting mid-window; inflection coincides with a visible user-side event.
All the data you need (session_count, model mix, inflection method, cross-version presence) is rendered as plain text in the skeleton.
- 暂不判定:≤1个模型端信号
- 疑似退化:2-3个模型端信号
- 确认退化:≥3个模型端信号 + 非空的跨版本对比显示性能下降 + + ≥2个模型版本 +
session_count ≥ 15inflection.gap_sigma ≥ 1.0 - 结论不明确:或
session_count < 15且存在重叠干扰因素inflection.method == "fallback_split_half"
以下情况结论上限为“疑似退化”:仅使用一个模型;会话数<15;单一项目且项目在分析窗口中途启动;拐点与明显的用户端事件重合。
所需的所有数据(会话数、模型组合、拐点检测方法、跨版本对比是否存在)均已在框架中以纯文本形式呈现。
Grounding example (for finding-N-reason
)
finding-N-reason参考示例(针对finding-N-reason
)
finding-N-reasonClassification: model-side. Read:Edit dropped 9.0 → 1.0 (-89%, concerning) while cross-version shows opus-4-7 at 0.39 vs opus-4-6 at 1.00 on the same user's workload. No project-mix shift near the inflection — model is defaulting to edit-first.
分类:模型端。读/编辑比率从9.0降至1.0(下降89%,值得关注),同时跨版本对比显示,同一用户工作负载下opus-4-7版本为0.39,而opus-4-6版本为1.00。拐点附近无项目组合变化——模型默认优先选择编辑模式。
Hard rules
硬性规则
- Never read, grep, or glob . Never run
~/.claude/projects/**/*.jsonl/jq/awkon session files. Script owns all that.wc - Never touch tables or numbers — they came from real data.
- Every finding gets a classification label.
- Hedge when cross-version is empty or .
session_count < 15 - Do not verdict CONFIRMED REGRESSION without the full checklist.
- Do not save the skeleton as-is — replace every first.
<!-- C: ... -->
- 禁止读取、grep或遍历文件。禁止在会话文件上运行
~/.claude/projects/**/*.jsonl/jq/awk命令。所有这些操作均由脚本完成。wc - 禁止修改表格或数值——这些均来自真实数据。
- 每个结论都必须添加分类标签。
- 当跨版本对比为空或时,需谨慎表述。
session_count < 15 - 未满足所有 checklist 条件时,不得判定为“确认退化”。
- 禁止直接保存框架文件——必须先替换所有占位符。
<!-- C: ... -->
Failure modes
故障处理
- Script import error → check ≥ 3.8; retry once with
python3 -V; else stop.--include-agents - Skeleton < 5KB → likely no sessions in window. Check script error.
- → state it; cap at SUSPECTED.
inflection.method == fallback_split_half - Cross-version Δ → div-by-zero when model-A value is 0; note the confound.
None
- 脚本导入错误 → 检查是否≥3.8;使用
python3 -V参数重试一次;否则终止。--include-agents - 框架文件大小<5KB → 分析窗口内可能无会话数据。检查脚本错误信息。
- → 需注明此情况;结论上限为“疑似退化”。
inflection.method == fallback_split_half - 跨版本差异值Δ为→ 当模型A的数值为0时出现除零错误;需注明此干扰因素。
None