incident-response-smart-fix

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Intelligent Issue Resolution with Multi-Agent Orchestration

基于多Agent编排的智能问题解决

[Extended thinking: This workflow implements a sophisticated debugging and resolution pipeline that leverages AI-assisted debugging tools and observability platforms to systematically diagnose and resolve production issues. The intelligent debugging strategy combines automated root cause analysis with human expertise, using modern 2024/2025 practices including AI code assistants (GitHub Copilot, Claude Code), observability platforms (Sentry, DataDog, OpenTelemetry), git bisect automation for regression tracking, and production-safe debugging techniques like distributed tracing and structured logging. The process follows a rigorous four-phase approach: (1) Issue Analysis Phase - error-detective and debugger agents analyze error traces, logs, reproduction steps, and observability data to understand the full context of the failure including upstream/downstream impacts, (2) Root Cause Investigation Phase - debugger and code-reviewer agents perform deep code analysis, automated git bisect to identify introducing commit, dependency compatibility checks, and state inspection to isolate the exact failure mechanism, (3) Fix Implementation Phase - domain-specific agents (python-pro, typescript-pro, rust-expert, etc.) implement minimal fixes with comprehensive test coverage including unit, integration, and edge case tests while following production-safe practices, (4) Verification Phase - test-automator and performance-engineer agents run regression suites, performance benchmarks, security scans, and verify no new issues are introduced. Complex issues spanning multiple systems require orchestrated coordination between specialist agents (database-optimizer → performance-engineer → devops-troubleshooter) with explicit context passing and state sharing. The workflow emphasizes understanding root causes over treating symptoms, implementing lasting architectural improvements, automating detection through enhanced monitoring and alerting, and preventing future occurrences through type system enhancements, static analysis rules, and improved error handling patterns. Success is measured not just by issue resolution but by reduced mean time to recovery (MTTR), prevention of similar issues, and improved system resilience.]
[拓展思考:该工作流实现了一套复杂的调试与问题解决流程,借助AI辅助调试工具和可观测性平台来系统性地诊断并解决生产环境问题。智能调试策略结合了自动化根因分析与人类专业知识,采用2024/2025年的现代化实践,包括AI代码助手(GitHub Copilot、Claude Code)、可观测性平台(Sentry、DataDog、OpenTelemetry)、用于回归追踪的git bisect自动化,以及分布式追踪、结构化日志等生产环境安全调试技术。该流程遵循严格的四阶段方法:(1) 问题分析阶段 - 错误检测Agent与调试Agent分析错误轨迹、日志、复现步骤和可观测性数据,以理解故障的完整上下文,包括上下游影响;(2) 根因调查阶段 - 调试Agent与代码评审Agent执行深度代码分析、自动化git bisect以定位引入问题的提交、依赖兼容性检查和状态检查,从而隔离确切的故障机制;(3) 修复实现阶段 - 特定领域Agent(python-pro、typescript-pro、rust-expert等)在遵循生产环境安全实践的同时,实现最小化修复,并覆盖全面的测试用例,包括单元测试、集成测试和边缘场景测试;(4) 验证阶段 - 自动化测试Agent与性能工程师Agent运行回归测试套件、性能基准测试、安全扫描,并验证未引入新问题。跨多个系统的复杂问题需要专业Agent之间的协调编排(如database-optimizer → performance-engineer → devops-troubleshooter),并进行明确的上下文传递与状态共享。该工作流强调理解根因而非仅处理症状,实现持久化架构改进,通过增强监控与告警实现自动化检测,并通过类型系统增强、静态分析规则和改进的错误处理模式防止未来问题发生。成功的衡量标准不仅是问题解决,还包括缩短平均恢复时间(MTTR)、预防类似问题以及提升系统韧性。]

Use this skill when

适用场景

  • Working on intelligent issue resolution with multi-agent orchestration tasks or workflows
  • Needing guidance, best practices, or checklists for intelligent issue resolution with multi-agent orchestration
  • 处理基于多Agent编排的智能问题解决任务或工作流时
  • 需要基于多Agent编排的智能问题解决的指导、最佳实践或检查清单时

Do not use this skill when

不适用场景

  • The task is unrelated to intelligent issue resolution with multi-agent orchestration
  • You need a different domain or tool outside this scope
  • 任务与基于多Agent编排的智能问题解决无关时
  • 需要该范围之外的其他领域或工具时

Instructions

使用说明

  • Clarify goals, constraints, and required inputs.
  • Apply relevant best practices and validate outcomes.
  • Provide actionable steps and verification.
  • If detailed examples are required, open
    resources/implementation-playbook.md
    .
  • 明确目标、约束条件和所需输入。
  • 应用相关最佳实践并验证结果。
  • 提供可执行步骤与验证方法。
  • 若需要详细示例,请打开
    resources/implementation-playbook.md

Resources

资源

  • resources/implementation-playbook.md
    for detailed patterns and examples.
  • resources/implementation-playbook.md
    :包含详细模式与示例。