verify-fix
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese/verify-fix - Verify Incident Fix with Observables
/verify-fix - 基于可观测数据验证事件修复结果
MANDATORY verification step after any production incident fix.
强制要求的生产环境事件修复后验证步骤。
Philosophy
核心理念
A fix is just a hypothesis until proven by metrics. "That should fix it" is not verification.
在通过指标验证之前,任何修复都只是假设。“应该能解决问题”不能算作验证。
When to Use
适用场景
- After applying ANY fix to a production incident
- Before declaring an incident resolved
- When someone says "I think that fixed it"
- 对生产环境事件应用任何修复措施后
- 宣布事件解决之前
- 当有人表示“我觉得问题已经修复了”时
Verification Protocol
验证流程
1. Define Observable Success Criteria
1. 定义可观测的成功标准
Before testing, explicitly state what we expect to see:
SUCCESS CRITERIA:
- [ ] Log entry: "[specific log message]"
- [ ] Metric change: [metric] goes from [X] to [Y]
- [ ] Database state: [field] = [expected value]
- [ ] API response: [endpoint] returns [expected response]在测试前,明确说明我们期望看到的结果:
SUCCESS CRITERIA:
- [ ] Log entry: "[specific log message]"
- [ ] Metric change: [metric] goes from [X] to [Y]
- [ ] Database state: [field] = [expected value]
- [ ] API response: [endpoint] returns [expected response]2. Trigger Test Event
2. 触发测试事件
bash
undefinedbash
undefinedFor webhook issues:
针对Webhook问题:
stripe events resend [event_id] --webhook-endpoint [endpoint_id]
stripe events resend [event_id] --webhook-endpoint [endpoint_id]
For API issues:
针对API问题:
curl -X POST [endpoint] -d '[test payload]'
curl -X POST [endpoint] -d '[test payload]'
For auth issues:
针对认证问题:
Log in as test user, perform action
使用测试用户登录,执行操作
undefinedundefined3. Observe Results
3. 观察结果
bash
undefinedbash
undefinedWatch logs in real-time
实时查看日志
vercel logs [app] --json | grep [pattern]
vercel logs [app] --json | grep [pattern]
Or for Convex:
针对Convex:
npx convex logs --prod | grep [pattern]
npx convex logs --prod | grep [pattern]
Check metrics
检查指标
stripe events retrieve [event_id] | jq '.pending_webhooks'
undefinedstripe events retrieve [event_id] | jq '.pending_webhooks'
undefined4. Verify Database State
4. 验证数据库状态
bash
undefinedbash
undefinedCheck the affected record
检查受影响的记录
npx convex run --prod [query] '{"id": "[affected_id]"}'
undefinednpx convex run --prod [query] '{"id": "[affected_id]"}'
undefined5. Document Evidence
5. 记录验证证据
VERIFICATION EVIDENCE:
- Timestamp: [when]
- Test performed: [what we did]
- Log entry observed: [paste relevant log]
- Metric before: [value]
- Metric after: [value]
- Database state confirmed: [yes/no]
VERDICT: [VERIFIED / NOT VERIFIED]VERIFICATION EVIDENCE:
- 时间戳:[具体时间]
- 执行的测试:[我们所做的操作]
- 观察到的日志条目:[粘贴相关日志]
- 修复前指标:[数值]
- 修复后指标:[数值]
- 数据库状态已确认:[是/否]
验证结论:[已验证 / 未验证]Red Flags (Fix NOT Verified)
警示信号(修复未通过验证)
- "The code looks right now"
- "The config is correct"
- "It should work"
- "Let's wait and see"
- No log entry observed
- Metrics unchanged
- Can't reproduce the original symptom
- “现在代码看起来没问题”
- “配置是正确的”
- “应该能正常工作”
- “我们再等等看”
- 未观察到预期日志条目
- 指标无变化
- 无法复现原始问题症状
Example: Webhook Fix Verification
示例:Webhook修复验证
bash
undefinedbash
undefined1. Resend the failing event
1. 重新发送失败的事件
stripe events resend evt_xxx --webhook-endpoint we_xxx
stripe events resend evt_xxx --webhook-endpoint we_xxx
2. Watch logs (expect to see "Webhook received")
2. 查看日志(预期看到"Webhook received")
timeout 15 vercel logs app --json | grep webhook
timeout 15 vercel logs app --json | grep webhook
3. Check delivery metric (expect decrease)
3. 检查交付指标(预期数值下降)
stripe events retrieve evt_xxx | jq '.pending_webhooks'
stripe events retrieve evt_xxx | jq '.pending_webhooks'
Before: 4, After: 3 = DELIVERY SUCCEEDED
修复前:4,修复后:3 = 交付成功
4. Check database state
4. 检查数据库状态
npx convex run --prod users/queries:getUserByClerkId '{"clerkId": "user_xxx"}'
npx convex run --prod users/queries:getUserByClerkId '{"clerkId": "user_xxx"}'
Expect: subscriptionStatus = "active"
预期结果:subscriptionStatus = "active"
VERDICT: VERIFIED - all 4 checks passed
验证结论:已验证 - 所有4项检查均通过
undefinedundefinedIf Verification Fails
若验证失败
- Don't panic - the fix hypothesis was wrong, that's okay
- Revert if the fix made things worse
- Loop back to observation phase (OODA-V)
- Question assumptions - what did we miss?
- 不要惊慌 - 修复假设错误是正常的
- 回滚如果修复导致问题恶化
- 回到观察阶段(OODA-V)
- 质疑假设 - 我们遗漏了什么?