error-detector

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Error Detector Skill

错误检测Skill

Purpose

用途

Provides error analysis and pattern detection expertise specializing in proactive identification of software defects, code analysis, and system behavior monitoring. Identifies, analyzes, and helps prevent software errors through static and dynamic analysis techniques.
提供错误分析与模式检测专业能力,专注于主动识别软件缺陷、代码分析及系统行为监控。通过静态与动态分析技术识别、分析并协助预防软件错误。

When to Use

使用场景

  • Performing static code analysis and anti-pattern detection
  • Analyzing runtime errors and exception patterns
  • Detecting memory leaks and performance bottlenecks
  • Monitoring and analyzing error logs
  • Identifying security vulnerabilities through code patterns
  • Conducting proactive error prevention analysis
  • 执行静态代码分析与反模式检测
  • 分析运行时错误与异常模式
  • 检测内存泄漏与性能瓶颈
  • 监控并分析错误日志
  • 通过代码模式识别安全漏洞
  • 开展主动错误预防分析

Overview

概述

Specialized in error analysis, pattern detection, and proactive identification of software defects through code analysis, log monitoring, and system behavior analysis.
专注于通过代码分析、日志监控及系统行为分析,进行错误分析、模式检测与软件缺陷的主动识别。

Error Detection Methodologies

错误检测方法论

Static Analysis

静态分析

  • Code pattern recognition
  • Anti-pattern identification
  • Complexity analysis
  • Security vulnerability detection
  • Performance bottleneck identification
  • 代码模式识别
  • 反模式识别
  • 复杂度分析
  • 安全漏洞检测
  • 性能瓶颈识别

Dynamic Analysis

动态分析

  • Runtime error monitoring
  • Exception pattern analysis
  • Memory leak detection
  • Performance profiling
  • Resource utilization tracking
  • 运行时错误监控
  • 异常模式分析
  • 内存泄漏检测
  • 性能剖析
  • 资源使用跟踪

Log-Based Analysis

基于日志的分析

bash
undefined
bash
undefined

Example patterns for error detection

Example patterns for error detection

grep -r "ERROR|FATAL|CRITICAL" logs/ --include=".log" --include=".txt" grep -r "exception|error|failed" src/ --include=".js" --include=".py" --include=".java" grep -r "TODO|FIXME|HACK" src/ --include=".*" --exclude-dir=node_modules
undefined
grep -r "ERROR|FATAL|CRITICAL" logs/ --include=".log" --include=".txt" grep -r "exception|error|failed" src/ --include=".js" --include=".py" --include=".java" grep -r "TODO|FIXME|HACK" src/ --include=".*" --exclude-dir=node_modules
undefined

Error Categories & Patterns

错误类别与模式

Common Programming Errors

常见编程错误

  • Null pointer exceptions
  • Array index out of bounds
  • Type conversion errors
  • Resource leak issues
  • Concurrency problems
  • 空指针异常
  • 数组索引越界
  • 类型转换错误
  • 资源泄漏问题
  • 并发问题

Logic Errors

逻辑错误

  • Off-by-one errors
  • Incorrect conditionals
  • Loop termination issues
  • State management problems
  • Data validation failures
  • 差一错误
  • 条件判断错误
  • 循环终止问题
  • 状态管理问题
  • 数据验证失败

Performance Errors

性能错误

  • Inefficient algorithms
  • Memory optimization issues
  • Database query problems
  • Network timeout handling
  • Resource contention
  • 低效算法
  • 内存优化问题
  • 数据库查询问题
  • 网络超时处理
  • 资源竞争

Advanced Detection Techniques

高级检测技术

Machine Learning-Based Detection

基于机器学习的检测

  • Anomaly detection in system behavior
  • Pattern recognition in error logs
  • Predictive failure modeling
  • Classification of error types
  • Automated root cause analysis
  • 系统行为异常检测
  • 错误日志模式识别
  • 预测性故障建模
  • 错误类型分类
  • 自动化根因分析

Statistical Analysis

统计分析

  • Error frequency distribution
  • Time series analysis of failures
  • Correlation analysis between components
  • Regression testing failure patterns
  • Performance degradation detection
  • 错误频率分布
  • 故障时间序列分析
  • 组件间相关性分析
  • 回归测试故障模式
  • 性能退化检测

Code Complexity Metrics

代码复杂度指标

  • Cyclomatic complexity analysis
  • Cognitive complexity assessment
  • Maintainability index calculation
  • Technical debt quantification
  • Code duplication detection
  • 圈复杂度分析
  • 认知复杂度评估
  • 可维护性指数计算
  • 技术债务量化
  • 代码重复检测

Error Analysis Frameworks

错误分析框架

Root Cause Analysis (RCA)

根因分析(RCA)

  • Five Whys methodology
  • Fishbone diagram analysis
  • Pareto analysis for prioritization
  • Fault tree analysis
  • Change impact assessment
  • 五问法
  • 鱼骨图分析
  • 帕累托优先级分析
  • 故障树分析
  • 变更影响评估

Error Classification Systems

错误分类系统

  • Severity categorization
  • Priority assignment frameworks
  • Impact assessment matrices
  • Frequency-based prioritization
  • Business risk evaluation
  • 严重程度分类
  • 优先级分配框架
  • 影响评估矩阵
  • 基于频率的优先级排序
  • 业务风险评估

Pattern Recognition

模式识别

  • Repetitive error identification
  • Error clustering algorithms
  • Sequence pattern analysis
  • Correlation detection
  • Temporal pattern analysis
  • 重复错误识别
  • 错误聚类算法
  • 序列模式分析
  • 相关性检测
  • 时间模式分析

Monitoring & Alerting

监控与告警

Real-Time Monitoring

实时监控

  • System health dashboards
  • Error rate monitoring
  • Performance threshold alerts
  • Log aggregation and analysis
  • Automated incident response
  • 系统健康仪表盘
  • 错误率监控
  • 性能阈值告警
  • 日志聚合与分析
  • 自动化事件响应

Predictive Analysis

预测分析

  • Failure prediction models
  • Early warning systems
  • Trend analysis and forecasting
  • Capacity planning alerts
  • Proactive maintenance scheduling
  • 故障预测模型
  • 早期预警系统
  • 趋势分析与预测
  • 容量规划告警
  • 主动维护调度

Logging Best Practices

日志最佳实践

  • Structured logging implementation
  • Log level optimization
  • Sensitive data protection
  • Log rotation policies
  • Centralized log management
  • 结构化日志实现
  • 日志级别优化
  • 敏感数据保护
  • 日志轮转策略
  • 集中式日志管理

Error Prevention Strategies

错误预防策略

Code Quality Improvement

代码质量提升

  • Peer review processes
  • Automated testing coverage
  • Static analysis tools integration
  • Code style enforcement
  • Documentation standards
  • 同行评审流程
  • 自动化测试覆盖率
  • 静态分析工具集成
  • 代码风格强制执行
  • 文档标准

Development Process Optimization

开发流程优化

  • Test-driven development (TDD)
  • Continuous integration practices
  • Automated deployment pipelines
  • Rollback procedures
  • Feature flag implementation
  • 测试驱动开发(TDD)
  • 持续集成实践
  • 自动化部署流水线
  • 回滚流程
  • 功能开关实现

System Design Patterns

系统设计模式

  • Circuit breaker patterns
  • Retry mechanisms
  • Graceful degradation
  • Fallback systems
  • Redundancy implementation
  • 断路器模式
  • 重试机制
  • 优雅降级
  • 降级系统
  • 冗余实现

Error Detection Tools & Integration

错误检测工具与集成

Static Analysis Tools

静态分析工具

  • ESLint for JavaScript/TypeScript
  • Pylint for Python
  • SonarQube for multi-language analysis
  • Checkstyle for Java
  • FxCop for C#
  • ESLint(用于JavaScript/TypeScript)
  • Pylint(用于Python)
  • SonarQube(多语言分析)
  • Checkstyle(用于Java)
  • FxCop(用于C#)

Dynamic Monitoring Tools

动态监控工具

  • Application Performance Monitoring (APM)
  • Error tracking services (Sentry, Bugsnag)
  • Log management systems (ELK stack)
  • Distributed tracing tools
  • Infrastructure monitoring
  • 应用性能监控(APM)
  • 错误跟踪服务(Sentry、Bugsnag)
  • 日志管理系统(ELK stack)
  • 分布式追踪工具
  • 基础设施监控

Custom Detection Scripts

自定义检测脚本

  • Error pattern matching
  • Anomaly detection algorithms
  • Automated regression testing
  • Performance benchmarking
  • Data validation checks
  • 错误模式匹配
  • 异常检测算法
  • 自动化回归测试
  • 性能基准测试
  • 数据验证检查

Error Response & Resolution

错误响应与解决

Incident Management

事件管理

  • Error triage procedures
  • Escalation protocols
  • Communication templates
  • Resolution tracking
  • Post-incident reviews
  • 错误分类流程
  • 升级协议
  • 沟通模板
  • 解决跟踪
  • 事后复盘

Automated Recovery

自动化恢复

  • Self-healing mechanisms
  • Automatic restart procedures
  • Failover systems
  • Data recovery processes
  • Service restoration workflows
  • 自修复机制
  • 自动重启流程
  • 故障转移系统
  • 数据恢复流程
  • 服务恢复工作流

Knowledge Management

知识管理

  • Error documentation databases
  • Solution repositories
  • Best practice libraries
  • Training materials
  • Lessons learned archives
  • 错误文档数据库
  • 解决方案库
  • 最佳实践库
  • 培训材料
  • 经验教训档案

Specific Domain Expertise

特定领域专长

Web Application Errors

Web应用错误

  • HTTP error code analysis
  • JavaScript runtime errors
  • API failure patterns
  • Database connection issues
  • Frontend performance problems
  • HTTP错误码分析
  • JavaScript运行时错误
  • API故障模式
  • 数据库连接问题
  • 前端性能问题

Mobile Application Errors

移动应用错误

  • Device-specific issues
  • Network connectivity problems
  • App store rejection patterns
  • Battery usage optimization
  • Memory management issues
  • 设备特定问题
  • 网络连接问题
  • 应用商店拒绝模式
  • 电池使用优化
  • 内存管理问题

Backend System Errors

后端系统错误

  • Database transaction failures
  • Message queue processing errors
  • Authentication and authorization issues
  • Microservices communication problems
  • Resource exhaustion scenarios
  • 数据库事务失败
  • 消息队列处理错误
  • 认证与授权问题
  • 微服务通信问题
  • 资源耗尽场景

Reporting & Analytics

报告与分析

Error Metrics

错误指标

  • Mean Time To Detection (MTTD)
  • Mean Time To Resolution (MTTR)
  • Error frequency trends
  • Resolution effectiveness
  • Preventive action impact
  • 平均检测时间(MTTD)
  • 平均解决时间(MTTR)
  • 错误频率趋势
  • 解决有效性
  • 预防措施影响

Quality Dashboards

质量仪表盘

  • Real-time error monitoring
  • Historical trend analysis
  • Team performance metrics
  • System health indicators
  • Compliance status tracking
  • 实时错误监控
  • 历史趋势分析
  • 团队绩效指标
  • 系统健康指标
  • 合规状态跟踪

Deliverables

交付物

Analysis Reports

分析报告

  • Comprehensive error analysis
  • Root cause identification
  • Impact assessment documentation
  • Resolution recommendations
  • Prevention strategies
  • 全面错误分析
  • 根因识别
  • 影响评估文档
  • 解决建议
  • 预防策略

Implementation Plans

实施计划

  • Error detection system design
  • Monitoring setup procedures
  • Alerting configuration guides
  • Automated testing frameworks
  • Process improvement recommendations
  • 错误检测系统设计
  • 监控设置流程
  • 告警配置指南
  • 自动化测试框架
  • 流程改进建议

Training Materials

培训材料

  • Error handling best practices
  • Troubleshooting guides
  • Tool usage documentation
  • Process workflow diagrams
  • Knowledge base articles
  • 错误处理最佳实践
  • 故障排除指南
  • 工具使用文档
  • 流程工作流图
  • 知识库文章

Examples

示例

Example 1: E-Commerce Platform Error Monitoring

示例1:电商平台错误监控

Scenario: Implementing comprehensive error tracking for a high-traffic e-commerce site.
Implementation:
  1. Error Tracking: Sentry integration across all services
  2. Log Aggregation: ELK stack for centralized log management
  3. Alerting: PagerDuty integration for critical errors
  4. Dashboard: Custom Grafana dashboards for error metrics
Results:
  • MTTD reduced from hours to minutes
  • 40% reduction in time-to-resolution
  • Proactive identification of emerging issues
场景: 为高流量电商网站实施全面错误跟踪。
实施:
  1. 错误跟踪:在所有服务中集成Sentry
  2. 日志聚合:使用ELK stack进行集中式日志管理
  3. 告警:集成PagerDuty处理严重错误
  4. 仪表盘:自定义Grafana仪表盘展示错误指标
结果:
  • MTTD从数小时缩短至数分钟
  • 解决时间减少40%
  • 主动识别潜在问题

Example 2: Mobile App Crash Reporting

示例2:移动应用崩溃报告

Scenario: Setting up crash reporting for iOS and Android applications.
Approach:
  1. Crash Reporting: Firebase Crashlytics integration
  2. Symbolication: Automated dSYM upload for readable stack traces
  3. Breadcrumbs: User action tracking for context
  4. Release Tracking: Correlation of crashes with app versions
Key Metrics Tracked:
  • Crash-free users rate (target: 99.5%)
  • Top crashers by device and OS version
  • Session data with crash-free rate trends
  • User feedback correlation with crashes
场景: 为iOS和Android应用设置崩溃报告。
方法:
  1. 崩溃报告:集成Firebase Crashlytics
  2. 符号化:自动上传dSYM以生成可读堆栈跟踪
  3. 轨迹记录:跟踪用户操作以获取上下文
  4. 版本跟踪:关联崩溃与应用版本
跟踪的关键指标:
  • 无崩溃用户率(目标:99.5%)
  • 按设备和OS版本划分的顶级崩溃原因
  • 包含无崩溃率趋势的会话数据
  • 崩溃与用户反馈的关联

Example 3: API Gateway Error Analysis

示例3:API网关错误分析

Scenario: Monitoring and analyzing errors at API gateway level for a SaaS platform.
Monitoring Setup:
  1. Request Logging: All API requests logged with status codes
  2. Rate Tracking: Monitoring for 429 Too Many Requests patterns
  3. Latency Analysis: P95, P99 latency tracking by endpoint
  4. Authentication Errors: Tracking failed auth attempts for security
Alert Configuration:
  • Error rate spikes (> 5% for 5 minutes)
  • Latency degradation (> 1s for P95)
  • Authentication failures (> 100/min from single IP)
  • Circuit breaker state changes
场景: 为SaaS平台监控并分析API网关层面的错误。
监控设置:
  1. 请求日志:记录所有带状态码的API请求
  2. 速率跟踪:监控429 Too Many Requests模式
  3. 延迟分析:按端点跟踪P95、P99延迟
  4. 认证错误:跟踪失败的认证尝试以保障安全
告警配置:
  • 错误率突增(5分钟内超过5%)
  • 延迟退化(P95延迟超过1秒)
  • 认证失败(单IP每分钟超过100次)
  • 断路器状态变更

Best Practices

最佳实践

Error Detection Configuration

错误检测配置

  • Comprehensive Coverage: Instrument all code paths, not just critical functions
  • Context-Rich Data: Include user IDs, request IDs, environment details
  • Sensitive Data Handling: Scrub PII and secrets before error reporting
  • Sampling Strategy: Balance detail collection with performance impact
  • Tagging: Use consistent tagging for filtering and aggregation
  • 全面覆盖:为所有代码路径添加监控,而非仅关键函数
  • 富上下文数据:包含用户ID、请求ID、环境详情
  • 敏感数据处理:在错误报告前清理PII与机密信息
  • 采样策略:平衡细节收集与性能影响
  • 标记:使用一致的标记进行过滤与聚合

Alert Management

告警管理

  • Threshold Tuning: Adjust sensitivity to reduce alert fatigue
  • Escalation Paths: Clear procedures for different severity levels
  • Business Hours: Different expectations for on-call vs. business hours
  • Alert Fatigue Prevention: Consolidate related alerts, avoid duplicates
  • On-Call Rotation: Sustainable schedules with clear responsibilities
  • 阈值调优:调整敏感度以减少告警疲劳
  • 升级路径:针对不同严重程度的清晰流程
  • 工作时间:区分值班与工作时间的不同预期
  • 防止告警疲劳:合并相关告警,避免重复
  • 值班轮换:明确职责的可持续排班

Metrics and Reporting

指标与报告

  • Key Metrics: Track MTTD, MTTR, error rate, resolution rate
  • Trend Analysis: Weekly/monthly comparisons to identify patterns
  • SLA Reporting: Error impact on service level agreements
  • Team Dashboards: Custom views for different teams and roles
  • Executive Reporting: High-level summaries for leadership
  • 关键指标:跟踪MTTD、MTTR、错误率、解决率
  • 趋势分析:每周/每月对比以识别模式
  • SLA报告:错误对服务水平协议的影响
  • 团队仪表盘:为不同团队与角色定制视图
  • 管理层报告:面向领导层的高层摘要

Error Handling Best Practices

错误处理最佳实践

  • Defensive Programming: Validate inputs, handle edge cases
  • Graceful Degradation: Fallback mechanisms when dependencies fail
  • Error Recovery: Automatic retry with exponential backoff
  • User Communication: Meaningful error messages for end users
  • Logging: Comprehensive logs for debugging and audit trails
  • 防御式编程:验证输入,处理边缘情况
  • 优雅降级:依赖故障时的 fallback 机制
  • 错误恢复:带指数退避的自动重试
  • 用户沟通:为终端用户提供有意义的错误信息
  • 日志记录:用于调试与审计追踪的全面日志

Continuous Improvement

持续改进

  • Post-Incident Reviews: Learn from every significant error
  • Pattern Analysis: Identify recurring issues for systemic fixes
  • Knowledge Base: Document errors and solutions for future reference
  • Tool Evolution: Regularly evaluate and update detection tools
  • Team Training: Ensure consistent error handling practices
  • 事后复盘:从每一次重大错误中学习
  • 模式分析:识别重复问题以进行系统性修复
  • 知识库:记录错误与解决方案以供未来参考
  • 工具演进:定期评估并更新检测工具
  • 团队培训:确保一致的错误处理实践