cekura-fixing-prod-issues

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Fixing Production Call Issues

修复生产环境调用问题

Full workflow — debug first, reproduce before fixing, test thoroughly, then PR.
Phase 1      Phase 2         Phase 3     Phase 4        Phase 5        Phase 6
Debug   →   Reproduce   →   Fix    →   Verify    →   Regression  →   PR
Understand   Confirm bug     Write the   Same eval      Happy paths    All result
root cause   on Cekura       code fix    must PASS      + edge cases   URLs in PR
             BEFORE fix      + commit    now            pass too
完整工作流 —— 先调试,修复前先复现,全面测试,再提交PR
Phase 1      Phase 2         Phase 3     Phase 4        Phase 5        Phase 6
Debug   →   Reproduce   →   Fix    →   Verify    →   Regression  →   PR
Understand   Confirm bug     Write the   Same eval      Happy paths    All result
root cause   on Cekura       code fix    must PASS      + edge cases   URLs in PR
             BEFORE fix      + commit    now            pass too

The 6 Phases

六个阶段

PhaseFileWhat happens
1phase1-debug.mdFetch prod call + logs, identify root cause, confirm with user
2phase2-reproduce.mdBuild evaluator, attach metrics, run — eval must fail before any fix
3phase3-fix.mdWrite the code fix, commit locally
4phase4-verify.mdRe-run same evaluator — eval must pass now
5phase5-regression.mdTest all affected happy paths and edge cases
6phase6-pr.mdRaise PR with all Cekura result URLs

阶段文件操作内容
1phase1-debug.md获取生产调用记录及日志,确定根本原因,与用户确认
2phase2-reproduce.md构建评估器,附加指标,运行评估 —— 在修复前评估必须失败
3phase3-fix.md编写代码修复方案,本地提交代码
4phase4-verify.md重新运行同一评估器 —— 现在评估必须通过
5phase5-regression.md测试所有受影响的正常场景及边缘场景
6phase6-pr.md提交PR并附上所有Cekura结果链接

Strictness Rules — Read Before Starting

严格规则 —— 开始前必读

These rules are non-negotiable. Do not proceed past a gate without satisfying it.
这些规则不容协商。未满足关卡要求不得进入下一阶段。

Rule 0 — Use the same connection medium as the production call. No exceptions.

规则0 —— 使用与生产环境调用相同的连接介质,无例外。

Every reproduction, verification, and regression test MUST be a full end-to-end simulation on Cekura using the same transport the agent is configured for. Retrieve the agent record (
GET /test_framework/v1/ai-agents/{id}/
) to confirm its transport. Most likely telephony, but follow what the agent is actually configured to use.
❌ Text mode is never a valid substitute. ❌ Do not switch transports between phases.
The bug lives in the real call path; only a simulation over the same medium can confirm it.
所有复现、验证和回归测试都必须在Cekura上进行完整的端到端模拟,且使用与Agent配置一致的传输方式。 获取Agent记录(
GET /test_framework/v1/ai-agents/{id}/
)以确认其传输方式。大多数情况下是电话(telephony),但需遵循Agent实际配置的传输方式。
❌ 文本模式绝不能作为替代方案。❌ 各阶段之间不得切换传输方式。
bug存在于真实的调用路径中;只有通过相同介质的模拟才能确认bug是否存在。

Rule 1 — Phases are sequential. No skipping.

规则1 —— 阶段按顺序进行,不得跳过。

Each phase has a gate. A gate is not passed by assumption — it is passed by evidence. The sequence exists because:
  • You cannot write a good fix without understanding the root cause (Phase 1 gate)
  • You cannot trust a fix without first proving the bug exists in a controlled way (Phase 2 gate)
  • You cannot call regression tests meaningful without a passing fix verification (Phase 4 gate)
每个阶段都有一个关卡。关卡不能通过假设来跳过 —— 必须通过证据来通过。这个顺序的存在是因为:
  • 不理解根本原因就无法写出优质的修复方案(阶段1关卡)
  • 不先在受控环境中证明bug存在,就无法信任修复方案(阶段2关卡)
  • 修复验证未通过的情况下,回归测试毫无意义(阶段4关卡)

Rule 2 — Phase 2 is the hardest gate. Treat it as such.

规则2 —— 阶段2是最难的关卡,请重视。

Reproducing the bug is the most critical step. Do not move to Phase 3 until the eval definitively fails on Cekura with metric scores showing the failure. If there is any doubt about whether the bug is truly reproduced, stop and ask the user. Do not guess.
复现bug是最关键的步骤。在Cekura上运行评估并明确显示失败的指标分数之前,不得进入阶段3。 如果对bug是否真正复现存在任何疑问,请停止操作并询问用户。切勿猜测。

Rule 3 — When in doubt, ask.

规则3 —— 如有疑问,及时询问。

If you are unsure which metrics to use, whether the root cause is correct, whether the edge conditions are right, or whether a result is ambiguous — stop and ask the user. A wrong assumption here wastes the entire workflow.
如果不确定使用哪些指标、根本原因是否正确、边缘条件是否恰当,或者结果存在歧义 —— 停止操作并询问用户。此处的错误假设会浪费整个工作流的时间。

Rule 4 — Never push code until Phase 5 is complete.

规则4 —— 完成阶段5之前,切勿推送代码。

The commit happens in Phase 3. The push happens only after all regression tests pass in Phase 5.
代码提交在阶段3进行。只有在阶段5中所有回归测试通过后,才能推送代码。