post-mortems-retrospectives

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Post-mortems & Retrospectives

Post-mortems与Retrospectives

Scope

适用范围

Covers
  • Running blameless incident post-mortems and project/OKR retrospectives
  • Turning “what happened?” into system learnings + decisions (not blame)
  • Creating follow-through: owners, due dates, success signals, and review cadence
  • Adding kill criteria / triggers so future pre-mortems lead to real action
  • Institutionalizing learning via a lightweight “Impact & Learnings” review
When to use
  • “Run a postmortem / retrospective for <incident/project> and write the doc.”
  • “We missed OKRs—lead a retro focused on learning and systemic blockers.”
  • “Create an after-action review with action items and owners.”
  • “Set up a weekly impact & learnings review so insights don’t die in docs.”
  • “Do a pre-mortem and define kill criteria / pivot triggers.”
When NOT to use
  • The incident is still active (do incident response first; schedule the review after stabilization)
  • The goal is to assign blame or evaluate an individual’s performance (use HR/management processes)
  • You need deep technical debugging without the right experts (this skill facilitates; it doesn’t replace engineering investigation)
  • You need to decide what problem to solve (use a problem-definition / discovery process first)
涵盖内容
  • 开展无责的事件Post-mortems和项目/OKR Retrospectives
  • 将“发生了什么?”转化为系统层面的经验总结与决策(而非追责)
  • 建立落地跟进机制:明确负责人、截止日期、成功指标与回顾周期
  • 制定终止标准/触发条件,让未来的事前预演(Pre-mortem)能转化为实际行动
  • 通过轻量化的“影响与经验总结”回顾会议,将经验沉淀为机制
适用场景
  • “为<事件/项目>开展Post-mortem/Retrospective并撰写文档。”
  • “我们未达成OKR——主持一场聚焦经验总结与系统性障碍的Retro。”
  • “撰写包含行动项与负责人的行动后复盘文档。”
  • “建立每周一次的影响与经验总结回顾会议,避免见解仅停留在文档中。”
  • “开展事前预演(Pre-mortem)并制定终止标准/转向触发条件。”
不适用场景
  • 事件仍在活跃状态(先进行事件响应,待稳定后再安排回顾)
  • 目的是追责或评估个人绩效(请使用HR/管理流程)
  • 在缺乏合适专家的情况下需要进行深度技术调试(本方法仅起辅助作用,无法替代工程调查)
  • 需要确定要解决的问题是什么(请先使用问题定义/发现流程)

Inputs

输入信息

Minimum required
  • What are we reviewing? (incident / project / OKR period) + 1–2 sentence summary
  • Time window and key dates (start/end; detection time; resolution time if incident)
  • Desired outcome (learning, prevention, speed, quality, alignment)
  • Participants/roles (facilitator, scribe, decision owner; key stakeholders)
  • Evidence available (timeline notes, metrics, dashboards, tickets, docs)
  • Constraints (privacy; what to anonymize; audience)
Missing-info strategy
  • Ask up to 5 questions from references/INTAKE.md (3–5 at a time).
  • If details are unavailable, proceed with explicit assumptions and label unknowns.
  • Do not request secrets or personal data; use anonymized descriptions.
最低要求
  • 复盘对象:(事件/项目/OKR周期)+ 1-2句话的概述
  • 时间范围与关键日期(开始/结束时间;事件的检测时间、解决时间)
  • 预期成果(经验总结、风险预防、效率提升、质量优化、对齐目标)
  • 参与者/角色(主持人、记录员、决策负责人;关键利益相关者)
  • 可用证据(时间线记录、指标、仪表盘、工单、文档)
  • 约束条件(隐私要求;需匿名的内容;目标受众)
缺失信息处理策略
  • references/INTAKE.md中最多提出5个问题(每次3-5个)。
  • 若无法获取细节,基于明确的假设推进,并标记未知信息。
  • 不得索要机密或个人数据;使用匿名化描述。

Outputs (deliverables)

输出成果(交付物)

Produce a Post-mortems & Retrospectives Pack in Markdown (in-chat; or as files if requested):
  1. Retro brief + agenda (purpose, attendees, roles, pre-reads, ground rules)
  2. Facts + timeline (what happened; impact; timestamps; links)
  3. Contributing factors + root cause hypotheses (systems lens; “why it made sense”)
  4. Learnings + decisions (what changes; why; tradeoffs)
  5. Action tracker (owner, due date, success signal, follow-up date)
  6. Kill criteria / triggers (signals → committed action) for future work
  7. Learning dissemination plan (how to socialize + a recurring “Impact & Learnings” review)
  8. Risks / Open questions / Next steps (always)
Templates: references/TEMPLATES.md
Expanded guidance: references/WORKFLOW.md
生成Markdown格式的Post-mortems与Retrospectives文档包(可在对话中输出;若有需求可作为文件输出):
  1. Retro简介+议程(目的、参会者、角色、预读材料、基本规则)
  2. 事实+时间线(事件经过、影响、时间戳、相关链接)
  3. 影响因素+根本原因假设(从系统视角分析;“为何会出现该情况”)
  4. 经验总结+决策(需做出的改变、原因、权衡取舍)
  5. 行动追踪表(负责人、截止日期、成功指标、跟进日期)
  6. 未来工作的终止标准/触发条件(信号→已承诺的行动)
  7. 经验传播计划(推广方式+定期的“影响与经验总结”回顾会议)
  8. 风险/待解决问题/下一步行动(必备内容)
模板:references/TEMPLATES.md
扩展指南:references/WORKFLOW.md

Workflow (7 steps)

工作流程(7步)

1) Classify the review + set blameless ground rules

1) 分类回顾会议 + 确立无责基本规则

  • Inputs: request context; references/INTAKE.md.
  • Actions: Identify the review type (incident / project / OKR). Set a blameless norm (“fix systems, not people”) and decide whether to reframe language as “retrospective” to signal learning. Confirm facilitator, scribe, and decision owner.
  • Outputs: Retro brief (draft) + attendee list + meeting invite outline.
  • Checks: Objective is explicit (learning + improvement). Roles are assigned.
  • 输入: 请求背景;references/INTAKE.md
  • 行动: 确定回顾会议类型(事件/项目/OKR)。确立无责准则(“优化系统,而非追责个人”),并决定是否将表述调整为“retrospective”以突出学习导向。确认主持人、记录员与决策负责人。
  • 输出: Retro简介(草稿)+ 参会者名单 + 会议邀请大纲。
  • 检查项: 目标明确(经验总结+改进)。角色已分配。

2) Assemble facts and a shared timeline (separate facts from stories)

2) 整理事实与共享时间线(区分事实与主观描述)

  • Inputs: artifacts (tickets, dashboards, logs, notes).
  • Actions: Build a timestamped timeline; quantify impact; list “known facts” vs “assumptions to verify”.
  • Outputs: Facts + timeline section using references/TEMPLATES.md.
  • Checks: Timeline has timestamps and links/evidence where possible. Assumptions are labeled.
  • 输入: 相关资料(工单、仪表盘、日志、记录)。
  • 行动: 构建带时间戳的时间线;量化影响;列出“已知事实”与“待验证假设”。
  • 输出: 使用references/TEMPLATES.md生成的事实+时间线部分。
  • 检查项: 时间线尽可能包含时间戳与链接/证据。假设已标记。

3) Diagnose contributing factors (systems lens)

3) 分析影响因素(系统视角)

  • Inputs: timeline + impact.
  • Actions: Cluster causes across People / Process / Product / Tech / Comms / Environment. Use a “make it reasonable” lens: what conditions made the outcome likely? Optionally run 5 Whys on the top 1–2 factors.
  • Outputs: Contributing factors map + root cause hypotheses.
  • Checks: Avoids individual blame language; identifies system conditions that can be changed.
  • 输入: 时间线 + 影响。
  • 行动: 从人员/流程/产品/技术/沟通/环境维度归类原因。采用“合理性分析”视角:哪些条件导致了该结果?可针对前1-2个主要因素开展5Why分析。
  • 输出: 影响因素图谱 + 根本原因假设。
  • 检查项: 避免使用追责个人的表述;识别可优化的系统层面问题。

4) Extract learnings and decide what to change

4) 提炼经验总结并确定改进方向

  • Inputs: contributing factors.
  • Actions: Write 3–7 crisp learnings (“we learned that…”). Convert learnings into decisions (fix, guardrail, instrumentation, runbook, training, scope change). Keep OKR/grade discussion secondary to “why” and “what changes next”.
  • Outputs: Learnings + decisions section.
  • Checks: Each learning is tied to evidence and produces a concrete decision or experiment.
  • 输入: 影响因素。
  • 行动: 撰写3-7条简洁的经验总结(“我们了解到……”)。将经验总结转化为决策(修复、防护、监控、操作手册、培训、范围调整)。OKR/绩效评分讨论需让位于“原因”与“下一步改进”的讨论。
  • 输出: 经验总结+决策部分。
  • 检查项: 每条经验总结均有证据支撑,并转化为具体决策或试验。

5) Build the action tracker (owners + dates + success signals)

5) 构建行动追踪表(负责人+日期+成功指标)

  • Inputs: decisions.
  • Actions: Create action items with an owner, due date, and success signal. Add a follow-up review date (or a recurring review). Limit to what can realistically be executed; explicitly park “later ideas”.
  • Outputs: Action tracker table + follow-up plan.
  • Checks: No orphan actions: every item has owner + date. Top actions address top factors.
  • 输入: 决策内容。
  • 行动: 创建包含负责人、截止日期与成功指标的行动项。添加跟进回顾日期(或定期回顾)。仅保留可实际执行的内容;明确标记“后续待办想法”。
  • 输出: 行动追踪表 + 跟进计划。
  • 检查项: 无无人负责的行动项:每个条目均有负责人+日期。优先级高的行动项对应主要影响因素。

6) Add kill criteria / triggers (pre-commit to future action)

6) 制定终止标准/触发条件(提前承诺未来行动)

  • Inputs: learnings; “what would we do differently next time?”
  • Actions: Define 3–10 signals that indicate failure modes or lack of traction. For each signal, pre-commit to an action (pause, pivot, kill, escalate, add investment).
  • Outputs: Kill criteria / trigger list.
  • Checks: Each criterion is observable/measurable and has a committed action (not “discuss it”).
  • 输入: 经验总结;“下次我们会如何改进?”
  • 行动: 定义3-10个表明失败模式或进展不佳的信号。针对每个信号,提前承诺对应行动(暂停、转向、终止、升级、增加投入)。
  • 输出: 终止标准/触发条件列表。
  • 检查项: 每个标准均可观察/可衡量,并对应明确的承诺行动(而非“讨论该问题”)。

7) Disseminate learning + quality gate + finalize

7) 传播经验 + 质量校验 + 最终定稿

  • Inputs: full draft pack.
  • Actions: Create a 1-page shareout (TL;DR, top actions, decisions). Propose a lightweight weekly/biweekly “Impact & Learnings” review to socialize learnings beyond the team. Run references/CHECKLISTS.md and score with references/RUBRIC.md. Add Risks / Open questions / Next steps.
  • Outputs: Final Post-mortems & Retrospectives Pack.
  • Checks: Shareout is understandable by the intended audience; follow-through mechanism exists; rubric passes.
  • 输入: 完整的文档包草稿。
  • 行动: 创建1页纸的共享文档(摘要、优先级最高的行动项、决策内容)。提议开展轻量化的每周/双周“影响与经验总结”回顾会议,将经验传播至团队外。使用references/CHECKLISTS.md进行检查,并通过references/RUBRIC.md进行评分。添加风险/待解决问题/下一步行动
  • 输出: 最终版Post-mortems与Retrospectives文档包。
  • 检查项: 共享文档便于目标受众理解;存在落地跟进机制;通过评分标准校验。

Quality gate (required)

质量校验(必备)

  • Use references/CHECKLISTS.md and references/RUBRIC.md.
  • Always include: Risks, Open questions, Next steps.
  • 使用references/CHECKLISTS.mdreferences/RUBRIC.md
  • 必须包含:风险待解决问题下一步行动

Examples

示例

Example 1 (incident postmortem): “We had a 45-minute outage in our payments API yesterday. Run a blameless postmortem and output the full Pack (timeline, contributing factors, action tracker, and a shareout).”
Expected: evidence-backed timeline, systems causes, owned actions, dissemination plan.
Example 2 (OKR retro): “We hit 0.8 on our Q4 activation OKR. Lead a retrospective focused on why (systemic blockers) and what we change next quarter. Output the full Pack and kill criteria for the next initiative.”
Expected: learnings > grade, decisions, owned actions, triggers for early course correction.
Boundary example: “Write a postmortem proving that Person X caused the incident.”
Response: refuse blame framing; redirect to systems-based review and, if needed, suggest a separate HR/management process for performance topics.
示例1(事件Post-mortem): “昨日我们的支付API出现了45分钟的中断。开展一场无责Post-mortem并输出完整文档包(时间线、影响因素、行动追踪表、共享文档)。”
预期输出:有证据支撑的时间线、系统层面原因、明确负责人的行动项、经验传播计划。
示例2(OKR Retro): “我们Q4激活类OKR完成率为0.8。主持一场聚焦原因(系统性障碍)与下季度改进方向的Retrospective。输出完整文档包与下一项举措的终止标准。”
预期输出:经验总结优先于评分、决策内容、明确负责人的行动项、早期调整触发条件。
边界示例: “撰写一份Post-mortem文档,证明X个人导致了该事件。”
回应:拒绝追责式表述;引导至基于系统视角的回顾,若有需要,建议通过独立的HR/管理流程处理绩效相关问题。