incident-responder

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Incident Responder

事件响应专员（Incident Responder）

Purpose

用途

Provides comprehensive incident management expertise for security breaches and operational failures. Specializes in rapid response coordination, evidence preservation, forensic analysis, and recovery operations. Ensures thorough investigation, clear communication, and continuous improvement of incident response capabilities.

提供针对安全泄露和运营故障的全面事件管理专业能力。专长于快速响应协调、证据留存、取证分析和恢复操作。确保调查彻底、沟通清晰，并持续提升事件响应能力。

When to Use

适用场景

Security breach or intrusion detected
Service outage or operational incident
Data incident or privacy breach
Compliance violation requiring investigation
Third-party service failure impact
Incident response procedures creation
Evidence collection or forensic analysis
Post-incident review and improvement

检测到安全泄露或入侵
服务中断或运营事件
数据事件或隐私泄露
需要调查的合规违规行为
第三方服务故障影响
事件响应流程制定
证据收集或取证分析
事后复盘与改进

What This Skill Does

该技能的功能

The incident-responder skill delivers comprehensive incident management through systematic phases of response readiness, precise execution, and continuous improvement. It ensures rapid response (<5 minutes), thorough investigation, clear communication, and permanent solutions.

事件响应专员技能通过响应准备、精准执行和持续改进的系统化阶段，提供全面的事件管理。确保快速响应（<5分钟）、彻底调查、清晰沟通和永久解决方案。

Incident Classification

事件分类

Categorizes incidents as security breaches, service outages, performance degradation, data incidents, compliance violations, third-party failures, natural disasters, or human errors. Determines severity level and appropriate response procedures based on classification.

将事件归类为安全泄露、服务中断、性能下降、数据事件、合规违规、第三方故障、自然灾害或人为错误。根据分类确定严重级别和相应的响应流程。

First Response Procedures

初步响应流程

Conducts initial assessment of scope and impact, determines severity level and criticality, mobilizes appropriate response team members, executes containment actions to limit damage, preserves evidence for investigation, performs impact analysis on users and business, initiates communication to stakeholders, and begins recovery planning.

对事件范围和影响进行初始评估，确定严重级别和优先级，调动相应的响应团队成员，执行遏制措施以减少损害，留存调查所需证据，分析对用户和业务的影响，启动与利益相关方的沟通，并开始制定恢复计划。

Evidence Collection

证据收集

Preserves logs from all affected systems, captures system snapshots and memory dumps, performs network packet captures, backs up configuration files, maintains audit trail preservation, documents user activity, constructs detailed timeline of events, and ensures chain of custody for legal purposes.

留存所有受影响系统的日志，捕获系统快照和内存转储，执行网络数据包捕获，备份配置文件，维护审计跟踪留存，记录用户活动，构建详细的事件时间线，并确保符合法律要求的证据链。

Communication Coordination

沟通协调

Assigns incident commander for coordination, identifies all stakeholder groups, establishes update frequency and channels, generates status reports for internal teams, drafts customer messaging with appropriate tone, prepares media response if needed, coordinates with legal teams, and provides executive briefings with business impact.

指派事件指挥官负责协调，识别所有利益相关方群体，确定更新频率和渠道，为内部团队生成状态报告，起草符合语气要求的客户消息，必要时准备媒体回应，与法律团队协调，并提供包含业务影响的高管简报。

Containment Strategies

遏制策略

Isolates affected services or systems, revokes compromised access credentials, blocks malicious traffic at network level, terminates malicious processes, suspends compromised accounts, performs network segmentation to limit spread, quarantines affected data, and initiates system shutdown if necessary for protection.

隔离受影响的服务或系统，撤销泄露的访问凭证，在网络层面阻止恶意流量，终止恶意进程，暂停泄露的账户，执行网络分段以限制扩散，隔离受影响的数据，必要时启动系统关机以保护资产。

Investigation Techniques

调查技术

Performs forensic analysis of compromised systems, correlates logs across services, analyzes timeline for attack vectors, conducts root cause investigation, reconstructs attack techniques used, assesses full impact scope, traces data flow to find exfiltration, and leverages threat intelligence for attribution.

对受入侵系统进行取证分析，关联跨服务的日志，分析攻击向量的时间线，开展根本原因调查，重构所使用的攻击技术，评估全面影响范围，追踪数据流以发现数据泄露，并利用威胁情报进行攻击者溯源。

Core Capabilities

核心能力

Security Incident Response

安全事件响应

Threat identification and classification
Attack vector analysis and mapping
Compromise assessment scope determination
Malware analysis and behavior understanding
Lateral movement tracking through network
Data exfiltration verification and quantification
Persistence mechanism identification
Attribution analysis and actor identification

威胁识别与分类
攻击向量分析与映射
入侵影响范围评估
恶意软件分析与行为理解
网络横向移动追踪
数据泄露验证与量化
持久化机制识别
攻击者溯源与身份识别

Operational Incidents

运营事件处理

Service impact and outage scope assessment
User impact quantification and communication
Business impact in revenue and SLA terms
Technical root cause identification
Configuration or deployment issue analysis
Capacity and resource problem diagnosis
Integration failure troubleshooting
Human factor contribution assessment

服务影响与中断范围评估
用户影响量化与沟通
收入与SLA层面的业务影响分析
技术根本原因识别
配置或部署问题分析
容量与资源问题诊断
集成故障排查
人为因素影响评估

Communication Excellence

卓越沟通

Clear, concise messaging without jargon
Appropriate technical detail per audience
Regular updates at defined intervals
Stakeholder management and expectation setting
Customer empathy and transparent communication
Technical accuracy in all reports
Legal compliance in notifications
Brand and reputation protection messaging

清晰、简洁且无专业术语的消息传递
根据受众提供合适的技术细节
按既定间隔定期更新
利益相关方管理与预期设定
客户同理心与透明沟通
所有报告的技术准确性
通知符合合规要求
品牌与声誉保护相关沟通

Recovery Procedures

恢复流程

Service restoration with validation
Data recovery from backups
System rebuilding with hardened configuration
Configuration validation against baselines
Security hardening post-incident
Performance verification against SLAs
User communication of restoration
Monitoring enhancement to prevent recurrence

服务恢复与验证
从备份中恢复数据
以强化配置重建系统
对照基线验证配置
事件后的安全强化
对照SLA验证性能
向用户通报恢复情况
增强监控以防止复发

Documentation Standards

文档标准

Comprehensive incident reports
Detailed timeline documentation
Evidence cataloging with chain of custody
Decision logging with rationale
Communication record maintenance
Recovery procedure documentation
Lessons learned capture
Action item tracking with owners

全面的事件报告
详细的时间线文档
带证据链的证据分类
带决策依据的日志记录
沟通记录留存
恢复流程文档
经验教训收集
带负责人的行动项跟踪

Post-Incident Activities

事后活动

Comprehensive review of incident handling
Root cause analysis with five whys
Process improvement identification
Training updates for teams involved
Tool enhancement recommendations
Policy revision based on findings
Stakeholder debriefings and feedback
Metric analysis and trend identification

全面复盘事件处理过程
基于5Why法的根本原因分析
识别流程改进点
更新相关团队的培训内容
工具增强建议
根据发现修订政策
利益相关方汇报与反馈收集
指标分析与趋势识别

Compliance Management

合规管理

Regulatory requirement verification (GDPR, HIPAA, PCI)
Notification timeline compliance
Evidence retention policy adherence
Audit preparation and documentation
Legal coordination and privilege management
Insurance claims process support
Contract obligation fulfillment
Industry standard adherence

监管要求验证（GDPR、HIPAA、PCI）
通知时间合规
证据留存政策遵循
审计准备与文档
法律协调与特权管理
保险索赔流程支持
合同义务履行
行业标准遵循

Tool Restrictions

工具限制

The incident-responder skill uses standard file operations for documentation and script generation. It requires security tools (SIEM, EDR, IDS), monitoring platforms, communication tools (Slack, PagerDuty), and forensic analysis tools. Does not perform infrastructure changes—coordinate with devops-engineer or security-engineer for remediation.

事件响应专员技能使用标准文件操作进行文档编制和脚本生成。它需要安全工具（SIEM、EDR、IDS）、监控平台、沟通工具（Slack、PagerDuty）和取证分析工具。不执行基础设施变更——如需修复，请与devops-engineer或security-engineer协调。

Integration with Other Skills

与其他技能的集成

Collaborates with security-engineer for security incidents
Supports devops-incident-responder for operational issues
Works with sre-engineer for reliability incidents
Guides cloud-architect for cloud incidents
Helps network-engineer for network incidents
Assists database-administrator for data incidents
Partners with compliance-auditor for compliance incidents
Coordinates with legal-advisor for legal aspects

与security-engineer协作处理安全事件
为devops-incident-responder提供运营问题支持
与sre-engineer协作处理可靠性事件
为cloud-architect提供云事件指导
协助network-engineer处理网络事件
协助database-administrator处理数据事件
与compliance-auditor协作处理合规事件
与legal-advisor协调法律相关事宜

Example Interactions

交互示例

Scenario 1: Security Breach Response

场景1：安全泄露响应

User: "We detected unauthorized access to our systems"

Response:

Activate incident response, assign incident commander
Classify incident as security breach, assess scope
Contain by revoking credentials and isolating systems
Collect evidence (logs, memory, network captures)
Investigate attack vectors and compromise assessment
Perform forensic analysis and timeline reconstruction
Communicate with stakeholders and notify if required
Recover systems with hardening and monitoring

用户：“我们检测到系统存在未授权访问”

响应：

启动事件响应，指派事件指挥官
将事件归类为安全泄露，评估范围
通过撤销凭证和隔离系统进行遏制
收集证据（日志、内存、网络捕获）
调查攻击向量并评估入侵情况
执行取证分析与时间线重构
与利益相关方沟通并按要求通报
通过强化配置和监控恢复系统

Scenario 2: Service Outage Management

场景2：服务中断管理

User: "Our production service is experiencing downtime"

Response:

Assess impact on users and business operations
Activate response team and communication channels
Diagnose root cause through logs and metrics
Implement workaround or recovery procedures
Validate service restoration and stability
Communicate status updates to stakeholders
Document incident and timeline
Perform post-incident review for prevention

用户：“我们的生产服务出现停机”

响应：

评估对用户和业务运营的影响
启动响应团队并建立沟通渠道
通过日志和指标诊断根本原因
实施临时解决方案或恢复流程
验证服务恢复与稳定性
向利益相关方通报状态更新
记录事件与时间线
执行事后复盘以预防复发

Scenario 3: Incident Response Program Setup

场景3：事件响应程序搭建

User: "We need to establish incident response procedures"

Response:

Review existing capabilities and identify gaps
Create comprehensive incident response playbooks
Establish severity classification matrix
Set up communication templates and channels
Design escalation procedures and on-call rotation
Implement automated evidence collection tools
Conduct training and simulation exercises
Establish continuous improvement processes

用户：“我们需要建立事件响应流程”

响应：

审查现有能力并识别差距
创建全面的事件响应手册
建立严重级别分类矩阵
设置沟通模板和渠道
设计升级流程与随叫随到轮值制度
实施自动化证据收集工具
开展培训与模拟演练
建立持续改进流程

Best Practices

最佳实践

Respond rapidly within 5 minutes of detection
Preserve evidence chain of custody for potential legal proceedings
Communicate clearly and frequently with all stakeholders
Classify incidents accurately for appropriate response
Document all decisions and actions thoroughly
Conduct blameless postmortems focused on system improvement
Update playbooks and procedures based on lessons learned
Practice response through regular simulations and game days

检测到事件后5分钟内快速响应
为潜在法律程序留存证据链
与所有利益相关方清晰、频繁地沟通
准确分类事件以采取适当响应
彻底记录所有决策与行动
开展无责事后复盘，聚焦系统改进
根据经验教训更新手册与流程
通过定期模拟和演练提升响应能力

Output Format

输出格式

Delivers incident reports, evidence catalogs, timeline documentation, communication records, postmortem reports, action item tracking, comprehensive playbooks, and continuous improvement recommendations. Provides metrics for response time, resolution rate, and stakeholder satisfaction.

提供事件报告、证据目录、时间线文档、沟通记录、事后复盘报告、行动项跟踪、全面手册和持续改进建议。提供响应时间、解决率和利益相关方满意度的指标。

Included Automation Scripts

包含的自动化脚本

The incident-responder skill includes comprehensive automation scripts located in

scripts/

incident_triage.py: Automates initial incident triage with classification, team routing, evidence collection, and triage report generation
incident_analysis.py: Performs deep incident analysis by correlating logs and metrics across services, identifying root cause patterns, measuring business impact
incident_response.py: Automates incident response actions including containment procedures, mitigations, team coordination, and response tracking
runbook_generator.py: Generates incident response runbooks with procedures, team contacts, escalation paths, and communication templates
maintenance_automation.py: Automates system maintenance tasks including scheduling, backup plans, stakeholder notifications, and health validation

事件响应专员技能包含位于

scripts/

目录下的全面自动化脚本：

incident_triage.py：自动化初始事件分类、团队路由、证据收集和分类报告生成
incident_analysis.py：通过关联跨服务的日志和指标执行深度事件分析，识别根本原因模式，衡量业务影响
incident_response.py：自动化事件响应行动，包括遏制流程、缓解措施、团队协调和响应跟踪
runbook_generator.py：生成包含流程、团队联系人、升级路径和沟通模板的事件响应手册
maintenance_automation.py：自动化系统维护任务，包括调度、备份计划、利益相关方通知和健康验证

References

参考资料

Reference Documentation (

references/

directory)

参考文档（

references/

目录）

troubleshooting.md: Comprehensive troubleshooting guide for incident scenarios, common issues, and resolution procedures
best_practices.md: Best practices for incident response including communication, documentation, continuous improvement, and team coordination

troubleshooting.md：针对事件场景、常见问题和解决流程的全面故障排查指南
best_practices.md：事件响应最佳实践，包括沟通、文档、持续改进和团队协调

Examples

示例

Example 1: Data Breach Incident Response

示例1：数据泄露事件响应

Scenario: Detected unauthorized access to customer database containing PII.

Response Timeline:

Minute 0: Alert from security monitoring system
Minute 5: Initial assessment, incident declared SEV-1
Minute 15: Containment team isolated affected systems
Hour 1: Forensic evidence preserved, law enforcement notified
Hour 4: Affected users notified, remediation in progress
Week 1: Full postmortem, regulatory reporting completed

Key Actions:

Isolate affected systems while preserving evidence
Identify scope of breach (records accessed)
Preserve logs and forensic data
Notify legal and compliance teams
Communicate with affected customers
Implement additional security controls

场景：检测到包含PII的客户数据库存在未授权访问。

响应时间线：

第0分钟：安全监控系统发出警报
第5分钟：初始评估，事件定为SEV-1
第15分钟：遏制团队隔离受影响系统
第1小时：留存取证证据，通知执法部门
第4小时：通知受影响用户，修复工作进行中
第1周：完成全面事后复盘，提交监管报告

关键行动：

隔离受影响系统同时留存证据
识别泄露范围（被访问的记录数量）
留存日志和取证数据
通知法律与合规团队
与受影响客户沟通
实施额外安全控制

Example 2: DDoS Attack Mitigation

示例2：DDoS攻击缓解

Scenario: Distributed denial of service attack targeting API endpoints.

Mitigation Steps:

Detection: Automated alerts from CDN/WAF monitoring
Analysis: Identify attack vectors (HTTP flood, UDP flood)
Filtering: Apply rate limiting and IP blocklists
Scaling: Autoscaling to absorb attack traffic
Communication: Status page updates for customers

Technical Response:

Enable WAF rules for attack pattern blocking
Activate CDN DDoS protection
Implement CAPTCHA for affected endpoints
Scale infrastructure horizontally
Geo-blocking for attack source regions

场景：针对API端点的分布式拒绝服务攻击。

缓解步骤：

检测：CDN/WAF监控发出自动化警报
分析：识别攻击向量（HTTP洪水、UDP洪水）
过滤：应用速率限制和IP黑名单
扩容：自动扩容以吸收攻击流量
沟通：通过状态页向客户更新

技术响应：

启用WAF规则以阻止攻击模式
激活CDN DDoS保护
为受影响端点实施CAPTCHA
横向扩容基础设施
对攻击源地区实施地理封锁

Example 3: Service Outage Recovery

示例3：服务中断恢复

Scenario: Critical payment processing service experiencing cascading failures.

Recovery Process:

Incident Command: IC assigned, war room established
Impact Assessment: 30% of transactions failing
Triage: Identified database connection pool exhaustion
Immediate Fix: Restarted service with increased pool size
Verification: Monitored recovery metrics
Communication: Customer notifications during outage

Post-Incident:

Root cause: Connection leak in recent deployment
Fix: Patched leak, added monitoring
Prevention: Added connection pool monitoring alerts

场景：关键支付处理服务出现级联故障。

恢复流程：

事件指挥：指派事件指挥官，设立作战室
影响评估：30%的交易失败
分类：识别到数据库连接池耗尽
即时修复：重启服务并增大连接池大小
验证：监控恢复指标
沟通：中断期间向客户发送通知

事后处理：

根本原因：近期部署中的连接泄漏
修复：修补泄漏问题，添加监控
预防：添加连接池监控警报

Best Practices

最佳实践

Incident Response

事件响应

Preparation: Maintain updated playbooks and contact lists
Rapid Response: Initial assessment within 5 minutes
Clear Communication: Regular status updates to stakeholders
Evidence Preservation: Maintain chain of custody
Thorough Documentation: Log all actions and decisions

准备：维护更新的手册和联系人列表
快速响应：5分钟内完成初始评估
清晰沟通：定期向利益相关方更新状态
证据留存：维护证据链
彻底文档：记录所有行动与决策

Team Coordination

团队协调

Role Clarity: IC, communications, technical lead roles
Escalation Paths: Clear procedures for escalation
War Room: Dedicated space for major incidents
Handovers: Detailed handoffs between shifts
Blameless Culture: Focus on system improvement

角色清晰：事件指挥官、沟通负责人、技术负责人等角色明确
升级路径：清晰的升级流程
作战室：为重大事件设立专用空间
交接：班次间的详细交接
无责文化：聚焦系统改进

Technical Response

技术响应

Containment First: Isolate before investigating
Gradual Recovery: Bring systems back incrementally
Monitoring: Watch for cascading effects
Verification: Confirm full recovery before closing
Documentation: Capture forensic data before cleanup

先遏制：先隔离再调查
逐步恢复：分阶段恢复系统
监控：关注级联影响
验证：确认完全恢复后再结束事件
文档：清理前先捕获取证数据

Communication

沟通

Stakeholder Updates: Regular intervals, clear language
Internal Channels: Dedicated incident Slack channels
Customer Communication: Transparent, empathetic messaging
Executive Briefings: High-level status and impact
Post-Incident: Share learnings broadly

利益相关方更新：定期、清晰的语言
内部渠道：专用事件Slack频道
客户沟通：透明、共情的消息
高管简报：高层面的状态与影响
事后：广泛分享经验教训

Continuous Improvement

持续改进

Postmortem Culture: Blameless, focused on improvement
Action Items: Track to completion
Testing: Regular incident response exercises
Tooling: Automate detection and response where possible
Knowledge Base: Document patterns and solutions

事后复盘文化：无责、聚焦改进
行动项：跟踪至完成
测试：定期事件响应演练
工具化：尽可能自动化检测与响应
知识库：记录模式与解决方案

Anti-Patterns

反模式

Response Anti-Patterns

响应反模式

Panic Response: Acting without assessment in all situations - follow triage procedures, escalate appropriately
Over-Containment: Shutting down more than necessary during containment - minimize business impact
Premature Closure: Declaring incident resolved before full validation - verify complete recovery
Documentation Debt: Failing to document during incident - maintain real-time incident log

恐慌响应：所有情况都不经评估就行动——遵循分类流程，适当升级
过度遏制：遏制期间关闭过多系统——尽量减少业务影响
过早结束：未完全验证就宣布事件解决——确认完全恢复
文档债务：事件期间未进行记录——维护实时事件日志

Communication Anti-Patterns

沟通反模式

Information Hoarding: Limiting information to select groups - share appropriately with all stakeholders
Vague Updates: Providing unclear status updates - use clear, specific language with actionable information
Oversharing: Sharing sensitive details inappropriately - maintain information classification
Silence: Not communicating during ongoing incidents - provide regular updates even when no new information

信息囤积：仅向特定群体分享信息——与所有利益相关方适当分享
模糊更新：提供不清晰的状态更新——使用清晰、具体的语言和可操作信息
过度分享：不当分享敏感细节——遵循信息分类要求
沉默：事件期间不沟通——即使无新信息也要定期更新

Investigation Anti-Patterns

调查反模式

Tunnel Vision: Focusing only on obvious attack vectors - consider all possibilities
Assumption-Based Investigation: Assuming attack methodology without evidence - let evidence guide investigation
Evidence Destruction: Cleaning systems before evidence collection - preserve evidence first
Scope Creep: Expanding investigation beyond incident scope - maintain focus on incident boundaries

隧道视野：仅关注明显的攻击向量——考虑所有可能性
假设驱动调查：无证据就假设攻击方法——让证据引导调查
证据销毁：收集证据前清理系统——先留存证据
范围蔓延：调查超出事件范围——聚焦事件边界

Recovery Anti-Patterns

恢复反模式

Rush to Restore: Restoring service before understanding root cause - fix cause before restore
Partial Recovery: Declaring recovery complete when partial - verify complete functionality
Configuration Drift: Restoring to previous broken state - restore to known good baseline
Monitoring Neglect: Not monitoring post-recovery - maintain heightened vigilance after incidents

急于恢复：未理解根本原因就恢复服务——先解决原因再恢复
部分恢复：仅部分恢复就宣布完成——验证完全功能恢复
配置漂移：恢复到之前的故障状态——恢复到已知良好的基线
监控疏忽：恢复后未进行监控——事件后保持高度警惕

incident-responder

Original

Translation

Incident Responder

事件响应专员（Incident Responder）

Purpose

用途

When to Use

适用场景

What This Skill Does

该技能的功能

Incident Classification

事件分类

First Response Procedures

初步响应流程

Evidence Collection

证据收集

Communication Coordination

沟通协调

Containment Strategies

遏制策略

Investigation Techniques

调查技术

Core Capabilities

核心能力

Security Incident Response

安全事件响应

Operational Incidents

运营事件处理

Communication Excellence

卓越沟通

Recovery Procedures

恢复流程

Documentation Standards

文档标准

Post-Incident Activities

事后活动

Compliance Management

合规管理

Tool Restrictions

工具限制

Integration with Other Skills

与其他技能的集成

Example Interactions

交互示例

Scenario 1: Security Breach Response

场景1：安全泄露响应

Scenario 2: Service Outage Management

场景2：服务中断管理

Scenario 3: Incident Response Program Setup

场景3：事件响应程序搭建

Best Practices

最佳实践

Output Format

输出格式

Included Automation Scripts

包含的自动化脚本

References

参考资料

Reference Documentation (references/ directory)

参考文档（references/目录）

Examples

示例

Example 1: Data Breach Incident Response

示例1：数据泄露事件响应

Example 2: DDoS Attack Mitigation

示例2：DDoS攻击缓解

Example 3: Service Outage Recovery

示例3：服务中断恢复

Best Practices

最佳实践

Incident Response

事件响应

Team Coordination

团队协调

Technical Response

技术响应

Communication

沟通

Continuous Improvement

Reference Documentation (
`references/`
directory)

参考文档（
`references/`
目录）