prove-it
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseProve It
验证成果
<purpose>
Claude generates code by pattern-matching on training data. Something can look
syntactically perfect, follow best practices, and still be wrong. The model
optimizes for "looks right" not "works right." Verification is a separate
cognitive step that must be explicitly triggered. This skill closes the loop
between implementation and proof.
</purpose>
<purpose>
Claude通过对训练数据进行模式匹配来生成代码。有些代码可能语法完美、遵循最佳实践,但实际上仍然存在错误。该模型的优化目标是“看起来正确”而非“实际运行正确”。验证是一个必须主动触发的独立认知步骤。这项技能填补了实现与验证之间的闭环缺口。
</purpose>
Why This Matters (Technical Reality)
为何这至关重要(技术现实)
<technical-honesty>
Claude's limitations that this skill addresses:
1. Generation vs Execution
I generate code but don't run it. I predict what it would do based on patterns.
My confidence comes from "this looks like working code I've seen" not from
"I executed this and observed the result."
2. Training Signal Mismatch
My training optimizes for plausible next-token prediction, not outcome
verification. Saying "Done!" feels natural. Verifying feels like extra work.
But verification is where correctness actually lives.
3. Pattern-Matching Blindness
Code that matches common patterns feels correct. But subtle bugs hide in the
gaps between patterns. Off-by-one errors. Wrong variable names. Missing edge
cases. These "look right" but aren't.
4. Confidence-Correctness Gap
High confidence in my output doesn't correlate with actual correctness.
I'm often most confident when I'm most wrong, because the wrong answer
pattern-matched strongly.
5. No Feedback Loop
I generate sequentially. I don't naturally go back and check. Without
explicit verification, errors compound silently.
</technical-honesty>
<technical-honesty>
这项技能针对的Claude局限性:
1. 生成与执行的差异
我能生成代码,但无法运行它。我会基于模式预测代码的运行结果。我的信心来源于“这看起来和我见过的可运行代码一样”,而非“我执行过这段代码并观察到了结果”。
2. 训练信号不匹配
我的训练目标是优化合理的下一个token预测,而非结果验证。说“完成了!”是自然的反应,而验证则像是额外的工作。但验证才是确保正确性的关键所在。
3. 模式匹配的盲区
符合常见模式的代码会让人感觉是正确的,但细微的bug就隐藏在模式的缝隙中。比如差一错误、变量名错误、遗漏的边界情况。这些代码“看起来正确”,但实际并非如此。
4. 信心与正确性的差距
我对输出内容的高信心与实际正确性并无关联。往往在我最自信的时候,错误也最严重,因为错误答案与模式匹配度极高。
5. 缺乏反馈循环
我是按顺序生成内容的,不会自然地回头检查。如果没有主动验证,错误会悄无声息地累积。
</technical-honesty>
When To Verify
何时需要验证
<triggers>
ALWAYS verify before declaring complete:
Code Changes:
- New functions or modules
- Bug fixes
- Refactoring
- Configuration changes
- Build/deploy scripts
Fixes:
- "Fixed the bug" - did you reproduce and confirm it's gone?
- "Resolved the error" - did you trigger the error path again?
- "Updated the config" - did you restart and test?
Claims:
- Factual statements that matter to the decision
- "This will work because..." - did you prove it?
- "The file contains..." - did you actually read it?
<triggers>
在宣布完成之前,务必进行验证的场景:
代码变更:
- 新增函数或模块
- Bug修复
- 代码重构
- 配置变更
- 构建/部署脚本
修复场景:
- "已修复该Bug"——你是否重现了Bug并确认它已消失?
- "已解决该错误"——你是否再次触发了错误路径?
- "已更新配置"——你是否重启服务并进行了测试?
声明场景:
- 对决策至关重要的事实陈述
- "这样可行的原因是……"——你是否证明了这一点?
- "该文件包含……"——你是否实际查看过该文件?
Instructions
操作步骤
Step 1: Catch The Victory Lap
步骤1:叫停“宣告成功”的冲动
Before saying any of these:
- "Done!"
- "That should work"
- "I've implemented..."
- "The fix is..."
- "Complete"
STOP. You haven't verified yet.
在说出以下任何一句话之前:
- "完成!"
- "这样应该能行"
- "我已经实现了……"
- "修复方案是……"
- "已完成"
请暂停。你还没有进行验证。
Step 2: Determine Verification Method
步骤2:确定验证方式
| Change Type | Verification |
|---|---|
| New code | Run it with test input |
| Bug fix | Reproduce original bug, confirm fixed |
| Function change | Call the function, check output |
| Config change | Restart service, test affected feature |
| Build script | Run the build |
| API endpoint | Make a request |
| UI change | Describe what user should see, or screenshot |
| 变更类型 | 验证方式 |
|---|---|
| 新增代码 | 运行带测试输入的代码 |
| Bug修复 | 重现原始Bug,确认已修复 |
| 函数变更 | 调用函数,检查输出结果 |
| 配置变更 | 重启服务,测试受影响的功能 |
| 构建脚本 | 运行构建命令 |
| API接口 | 发起请求测试 |
| UI变更 | 描述用户应看到的内容,或提供截图 |
Step 3: Actually Verify
步骤3:实际执行验证
bash
undefinedbash
undefinedDon't just write the test - run it
不要只编写测试代码——要运行它
python -m pytest tests/test_new_feature.py
python -m pytest tests/test_new_feature.py
Don't just fix the code - prove the fix
不要只修复代码——要证明修复有效
python -c "from module import func; print(func(edge_case))"
python -c "from module import func; print(func(edge_case))"
Don't just update config - verify it loads
不要只更新配置——要验证配置已加载
node -e "console.log(require('./config.js'))"
undefinednode -e "console.log(require('./config.js'))"
undefinedStep 4: Report With Evidence
步骤4:附带证据报告
Verified:
What I changed:
- Added input validation to user_signup()
How I verified:
- Ran: python -c "from auth import user_signup; user_signup('')"
- Expected: ValidationError
- Got: ValidationError("Email required")
Proof that it works. Done.已验证:
我所做的变更:
- 为user_signup()添加了输入验证
验证方式:
- 运行命令:python -c "from auth import user_signup; user_signup('')"
- 预期结果:ValidationError
- 实际结果:ValidationError("Email required")
验证通过,任务完成。Verification Patterns
验证模式
Pattern 1: The Smoke Test
模式1:冒烟测试
Minimal test that proves basic functionality:
bash
undefined验证基本功能的最简测试:
bash
undefinedAfter writing a new function
编写完新函数后
python -c "from new_module import new_func; print(new_func('test'))"
If this crashes, you're not done.python -c "from new_module import new_func; print(new_func('test'))"
如果运行崩溃,说明任务未完成。Pattern 2: The Regression Check
模式2:回归检查
After fixing a bug, trigger the original failure:
bash
undefined修复Bug后,触发原始失败场景:
bash
undefinedBug was: crash on empty input
原始Bug:输入为空时崩溃
python -c "from module import func; func('')"
python -c "from module import func; func('')"
Should not crash anymore
现在应不再崩溃
undefinedundefinedPattern 3: The Build Gate
模式3:构建验证
Before claiming code is complete:
bash
undefined在宣布代码完成之前:
bash
undefinedDoes it at least compile/parse?
至少要检查代码能否编译/解析?
python -m py_compile new_file.py
npm run build
cargo check
undefinedpython -m py_compile new_file.py
npm run build
cargo check
undefinedPattern 4: The Integration Smell Test
模式4:集成测试
After changes that affect multiple components:
bash
undefined变更影响多个组件后:
bash
undefinedStart the service
启动服务
npm run dev &
npm run dev &
Hit the affected endpoint
请求受影响的接口
Check for expected response
检查是否得到预期响应
undefinedundefinedThe Verification Checklist
验证清单
Before declaring done:
[ ] Code runs without syntax errors
[ ] Function produces expected output on happy path
[ ] Edge case handled (empty, null, boundary)
[ ] Original bug/issue no longer reproduces
[ ] No new errors introduced (check logs)
[ ] Build passes (if applicable)在宣布完成之前,请确认:
[ ] 代码运行无语法错误
[ ] 函数在正常场景下输出预期结果
[ ] 已处理边界情况(空值、Null、边界值)
[ ] 原始Bug/问题不再重现
[ ] 未引入新错误(检查日志)
[ ] 构建通过(如适用)NEVER
绝对不要
- Say "done" without running the code
- Assume it works because it looks right
- Skip verification because you're confident
- Claim a bug is fixed without reproducing it first
- Trust that generated code is correct
- Let pattern-matching substitute for proof
- 未运行代码就说“完成”
- 因为代码看起来正确就假设它能运行
- 因为自信就跳过验证步骤
- 未重现Bug就宣称已修复
- 信任生成的代码是正确的
- 用模式匹配替代实际验证
ALWAYS
务必做到
- Run new code at least once
- Test the specific thing that was broken
- Show verification output in your response
- Admit when you can't verify (and say why)
- Treat verification as required, not optional
- 至少运行一次新增代码
- 测试被修复的具体问题
- 在回复中提供验证输出
- 当无法验证时如实说明(并解释原因)
- 将验证视为必需步骤,而非可选操作
Examples
示例
Example 1: Bad (No Verification)
示例1:错误示范(未验证)
User: "Fix the divide by zero error"
*edits code to add check*
Done! Added a check for zero before dividing. The function now returns
None if the divisor is zero.This is a lie. I didn't verify anything.
用户:“修复除零错误”
*编辑代码添加检查*
完成!已在除法前添加零值检查。现在当除数为零时,函数会返回None。这是不实陈述——我并未进行任何验证。
Example 1: Good (With Verification)
示例1:正确示范(已验证)
User: "Fix the divide by zero error"
*edits code to add check*
Let me verify the fix:
$ python -c "from math_utils import safe_divide; print(safe_divide(10, 0))"
None
$ python -c "from math_utils import safe_divide; print(safe_divide(10, 2))"
5.0
Verified:
- Division by zero returns None (was: crashed)
- Normal division still works
- Fix confirmed.用户:“修复除零错误”
*编辑代码添加检查*
让我验证修复效果:
$ python -c "from math_utils import safe_divide; print(safe_divide(10, 0))"
None
$ python -c "from math_utils import safe_divide; print(safe_divide(10, 2))"
5.0
验证结果:
- 除零场景返回None(之前会崩溃)
- 正常除法仍可正常运行
- 修复已确认有效。Example 2: Admitting Inability To Verify
示例2:说明无法验证的情况
User: "Update the production config"
I've updated config/production.json with the new API endpoint.
Verification status: UNABLE TO VERIFY
- This is production config
- I can't safely test against production
- Recommend: Test in staging first, or have ops verify after deploy
Change is made but unverified. Proceed with caution.用户:“更新生产环境配置”
我已在config/production.json中更新了新的API地址。
验证状态:无法验证
- 这是生产环境配置
- 我无法在生产环境中安全测试
- 建议:先在预发布环境测试,或部署后由运维人员验证
已完成变更但未经验证,请谨慎操作。Example 3: Build Verification
示例3:构建验证
User: "Add TypeScript types to the utils"
*adds types*
Verifying:
$ npm run typecheck
src/utils.ts - 0 errors
$ npm run build
Build successful - dist/utils.js generated
Verified: Types compile without errors, build passes.- "I'll verify later": Later never comes. Verify immediately or it won't happen.
- "It looks right": Pattern-matching is not verification. Code that looks perfect can be completely wrong.
- "The tests will catch it": Tests you haven't run don't catch anything. Run them.
- "It's a small change": Small changes cause big breaks. One-character typos crash production.
- Verifying only the happy path: The bug is in the edge case you didn't test.
- Reading the code as verification: Reading is not running. Execute it.
- "It worked before": Before is not now. Dependencies change. Environments differ.
- Trusting the type checker: Types compile doesn't mean logic is correct.
用户:“为工具类添加TypeScript类型定义”
*添加类型定义*
验证过程:
$ npm run typecheck
src/utils.ts - 0 errors
$ npm run build
Build successful - dist/utils.js generated
验证结果:类型定义编译无错误,构建通过。- “我稍后再验证”:稍后往往意味着永远不会验证。要么立即验证,要么干脆不做。
- “它看起来正确”:模式匹配不等于验证。看起来完美的代码可能完全错误。
- “测试会发现问题的”:未运行的测试发现不了任何问题。请运行测试。
- “这只是小变更”:小变更也可能引发大故障。一个字符的拼写错误就能导致生产环境崩溃。
- 仅验证正常场景:Bug往往隐藏在你未测试的边界情况中。
- 把阅读代码当作验证:阅读不等于运行。请执行代码。
- “之前运行正常”:之前不等于现在。依赖项会变化,环境也会不同。
- 信任类型检查器:类型编译通过不代表逻辑正确。
The Hard Truth
残酷的真相
When I say "Done!" without verifying, I'm not reporting a fact. I'm making a
prediction based on pattern-matching. Sometimes that prediction is wrong.
Verification converts prediction into observation. It's the difference between
"this should work" and "this works."
One is a guess. One is proof.
Prove it.
当我未经验证就说“完成!”时,我并非在陈述事实,而是基于模式匹配做出预测。有时这个预测是错误的。
验证将预测转化为实际观察结果。这是“应该能行”和“确实可行”之间的本质区别。
前者是猜测,后者是证据。
拿出实据来。