incident-response
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseIncident Response
事件响应
Guide incident response from detection through resolution and postmortem.
指导从检测、解决到事后复盘的完整事件响应流程。
Severity Classification
严重程度分级
| Level | Criteria | Response Time |
|---|---|---|
| SEV1 | Service down, all users affected | Immediate, all-hands |
| SEV2 | Major feature degraded, many users affected | Within 15 min |
| SEV3 | Minor feature issue, some users affected | Within 1 hour |
| SEV4 | Cosmetic or low-impact issue | Next business day |
| 级别 | 判定标准 | 响应时间 |
|---|---|---|
| SEV1 | 服务宕机,所有用户受影响 | 立即响应,全员参与 |
| SEV2 | 主要功能降级,大量用户受影响 | 15分钟内响应 |
| SEV3 | 次要功能故障,部分用户受影响 | 1小时内响应 |
| SEV4 | 界面瑕疵或低影响故障 | 下一个工作日响应 |
Response Framework
响应框架
- Triage: Classify severity, identify scope, assign incident commander
- Communicate: Status page, internal updates, customer comms if needed
- Mitigate: Stop the bleeding first, root cause later
- Resolve: Implement fix, verify, confirm resolution
- Postmortem: Blameless review, 5 whys, action items
- 分类分级:确定严重程度,识别影响范围,指派事件指挥官
- 沟通:更新状态页面、内部通知,必要时告知客户
- 缓解:先止损,再排查根本原因
- 解决:实施修复方案,验证并确认问题解决
- 事后复盘:无责复盘,5Why分析法,制定行动项
Communication Templates
沟通模板
Provide clear, factual updates at regular cadence. Include: what's happening, who's affected, what we're doing, when the next update is.
定期发布清晰、真实的更新内容,需包含:当前状况、受影响对象、我们正在采取的措施、下次更新时间。
Postmortem Format
复盘报告格式
Blameless. Focus on systems and processes. Include timeline, root cause analysis (5 whys), what went well, what went poorly, and action items with owners and due dates.
采用无责复盘原则,聚焦系统和流程。报告需包含时间线、根本原因分析(5Why法)、做得好的地方、待改进的地方,以及明确负责人和截止日期的行动项。