verify-fix

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

/verify-fix - Verify Incident Fix with Observables

/verify-fix - 基于可观测数据验证事件修复结果

MANDATORY verification step after any production incident fix.
强制要求的生产环境事件修复后验证步骤。

Philosophy

核心理念

A fix is just a hypothesis until proven by metrics. "That should fix it" is not verification.
在通过指标验证之前,任何修复都只是假设。“应该能解决问题”不能算作验证。

When to Use

适用场景

  • After applying ANY fix to a production incident
  • Before declaring an incident resolved
  • When someone says "I think that fixed it"
  • 对生产环境事件应用任何修复措施后
  • 宣布事件解决之前
  • 当有人表示“我觉得问题已经修复了”时

Verification Protocol

验证流程

1. Define Observable Success Criteria

1. 定义可观测的成功标准

Before testing, explicitly state what we expect to see:
SUCCESS CRITERIA:
- [ ] Log entry: "[specific log message]"
- [ ] Metric change: [metric] goes from [X] to [Y]
- [ ] Database state: [field] = [expected value]
- [ ] API response: [endpoint] returns [expected response]
在测试前,明确说明我们期望看到的结果:
SUCCESS CRITERIA:
- [ ] Log entry: "[specific log message]"
- [ ] Metric change: [metric] goes from [X] to [Y]
- [ ] Database state: [field] = [expected value]
- [ ] API response: [endpoint] returns [expected response]

2. Trigger Test Event

2. 触发测试事件

bash
undefined
bash
undefined

For webhook issues:

针对Webhook问题:

stripe events resend [event_id] --webhook-endpoint [endpoint_id]
stripe events resend [event_id] --webhook-endpoint [endpoint_id]

For API issues:

针对API问题:

curl -X POST [endpoint] -d '[test payload]'
curl -X POST [endpoint] -d '[test payload]'

For auth issues:

针对认证问题:

Log in as test user, perform action

使用测试用户登录,执行操作

undefined
undefined

3. Observe Results

3. 观察结果

bash
undefined
bash
undefined

Watch logs in real-time

实时查看日志

vercel logs [app] --json | grep [pattern]
vercel logs [app] --json | grep [pattern]

Or for Convex:

针对Convex:

npx convex logs --prod | grep [pattern]
npx convex logs --prod | grep [pattern]

Check metrics

检查指标

stripe events retrieve [event_id] | jq '.pending_webhooks'
undefined
stripe events retrieve [event_id] | jq '.pending_webhooks'
undefined

4. Verify Database State

4. 验证数据库状态

bash
undefined
bash
undefined

Check the affected record

检查受影响的记录

npx convex run --prod [query] '{"id": "[affected_id]"}'
undefined
npx convex run --prod [query] '{"id": "[affected_id]"}'
undefined

5. Document Evidence

5. 记录验证证据

VERIFICATION EVIDENCE:
- Timestamp: [when]
- Test performed: [what we did]
- Log entry observed: [paste relevant log]
- Metric before: [value]
- Metric after: [value]
- Database state confirmed: [yes/no]

VERDICT: [VERIFIED / NOT VERIFIED]
VERIFICATION EVIDENCE:
- 时间戳:[具体时间]
- 执行的测试:[我们所做的操作]
- 观察到的日志条目:[粘贴相关日志]
- 修复前指标:[数值]
- 修复后指标:[数值]
- 数据库状态已确认:[是/否]

验证结论:[已验证 / 未验证]

Red Flags (Fix NOT Verified)

警示信号(修复未通过验证)

  • "The code looks right now"
  • "The config is correct"
  • "It should work"
  • "Let's wait and see"
  • No log entry observed
  • Metrics unchanged
  • Can't reproduce the original symptom
  • “现在代码看起来没问题”
  • “配置是正确的”
  • “应该能正常工作”
  • “我们再等等看”
  • 未观察到预期日志条目
  • 指标无变化
  • 无法复现原始问题症状

Example: Webhook Fix Verification

示例:Webhook修复验证

bash
undefined
bash
undefined

1. Resend the failing event

1. 重新发送失败的事件

stripe events resend evt_xxx --webhook-endpoint we_xxx
stripe events resend evt_xxx --webhook-endpoint we_xxx

2. Watch logs (expect to see "Webhook received")

2. 查看日志(预期看到"Webhook received")

timeout 15 vercel logs app --json | grep webhook
timeout 15 vercel logs app --json | grep webhook

3. Check delivery metric (expect decrease)

3. 检查交付指标(预期数值下降)

stripe events retrieve evt_xxx | jq '.pending_webhooks'
stripe events retrieve evt_xxx | jq '.pending_webhooks'

Before: 4, After: 3 = DELIVERY SUCCEEDED

修复前:4,修复后:3 = 交付成功

4. Check database state

4. 检查数据库状态

npx convex run --prod users/queries:getUserByClerkId '{"clerkId": "user_xxx"}'
npx convex run --prod users/queries:getUserByClerkId '{"clerkId": "user_xxx"}'

Expect: subscriptionStatus = "active"

预期结果:subscriptionStatus = "active"

VERDICT: VERIFIED - all 4 checks passed

验证结论:已验证 - 所有4项检查均通过

undefined
undefined

If Verification Fails

若验证失败

  1. Don't panic - the fix hypothesis was wrong, that's okay
  2. Revert if the fix made things worse
  3. Loop back to observation phase (OODA-V)
  4. Question assumptions - what did we miss?
  1. 不要惊慌 - 修复假设错误是正常的
  2. 回滚如果修复导致问题恶化
  3. 回到观察阶段(OODA-V)
  4. 质疑假设 - 我们遗漏了什么?