fault-tree-analysis
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseFault Tree Analysis (FTA)
故障树分析(FTA)
Conduct systematic Fault Tree Analysis using a structured, Q&A-based approach with Boolean logic gates, minimal cut set identification, and optional probability calculations.
采用基于问答的结构化方法,结合布尔逻辑门、最小割集识别及可选的概率计算,开展系统性的故障树分析。
Overview
概述
Fault Tree Analysis is a top-down, deductive failure analysis method that maps how combinations of lower-level events (basic events) lead to an undesired system-level event (top event). Uses Boolean logic gates (AND, OR) to represent relationships between events.
Key Principle: One fault tree analyzes one specific undesired event. Start at the top (what failed?) and work down (what caused it?).
Analysis Types:
- Qualitative: Identify failure pathways, minimal cut sets, single points of failure
- Quantitative: Calculate failure probabilities using component failure data
故障树分析是一种自上而下的演绎式故障分析方法,用于梳理底层事件(基本事件)的组合如何引发不期望的系统级事件(顶事件)。该方法使用布尔逻辑门(AND、OR)来表示事件之间的关系。
核心原则:一棵故障树仅分析一个特定的不期望事件。从顶部开始(什么发生了故障?),逐步向下分析(故障原因是什么?)。
分析类型:
- 定性分析:识别故障路径、最小割集、单点故障
- 定量分析:利用部件故障数据计算故障概率
Workflow
工作流程
Phase 1: System Definition & Scope
阶段1:系统定义与范围确定
Collect from user:
- What system or process is being analyzed?
- What are the system boundaries (what's in scope vs. out of scope)?
- What are the operating conditions and assumptions?
- What documentation exists (schematics, P&IDs, operating procedures)?
- What is the purpose of this analysis (design review, incident investigation, safety case)?
Outputs:
- System description with boundaries
- Operating mode(s) under analysis
- List of assumptions and exclusions
从用户处收集以下信息:
- 正在分析的系统或流程是什么?
- 系统边界是什么(哪些属于分析范围,哪些不属于)?
- 运行条件和假设是什么?
- 现有哪些文档(原理图、管道及仪表流程图P&ID、操作程序)?
- 本次分析的目的是什么(设计评审、事故调查、安全案例编制)?
输出结果:
- 带边界的系统描述
- 分析覆盖的运行模式
- 假设与排除项列表
Phase 2: Top Event Definition
阶段2:顶事件定义
Collect from user:
- What is the single undesired outcome to analyze?
- How is this event defined (what state constitutes "failure")?
- What is the severity/criticality of this event?
- What is the mission time or exposure period?
Quality Gate - Top Event Must Be:
- Single, specific, unambiguous event
- Clearly defined failure state (not vague)
- At appropriate system level (not too high or too low)
- Observable or detectable
Good Example: "Pump fails to deliver required flow rate (>100 GPM) during normal operation"
Poor Example: "System doesn't work" (too vague)
从用户处收集以下信息:
- 要分析的单一不期望结果是什么?
- 该事件的定义是什么(什么状态构成“故障”)?
- 该事件的严重程度/关键性如何?
- 任务时间或暴露时长是多少?
质量关卡 - 顶事件必须满足:
- 单一、具体、无歧义的事件
- 故障状态定义清晰(不能模糊)
- 处于合适的系统层级(既不过高也不过低)
- 可观察或可检测
正面示例:“正常运行期间,泵无法输送要求的流量(>100 GPM)”
反面示例:“系统无法工作”(过于模糊)
Phase 3: Fault Tree Construction
阶段3:故障树构建
Build the tree iteratively from top to bottom:
For each event (starting with top event):
- Identify immediate causes: "What events could directly cause this?"
- Determine gate type:
- OR gate: ANY one cause is sufficient (independent causes)
- AND gate: ALL causes required simultaneously (redundancy/barriers)
- Classify event type:
- Intermediate event (rectangle): Requires further development
- Basic event (circle): Component failure, terminal point
- Undeveloped event (diamond): Insufficient data or out of scope
- House event (house symbol): Normal occurrence, switch on/off
- External event (house): Environmental or expected condition
- Continue developing until all branches terminate in basic/undeveloped events
Stopping Criteria for Branch Development:
- Component-level failure reached (basic event)
- Out of scope (undeveloped event)
- Normal expected condition (house event)
- Insufficient information available
Critical Rules:
- Each event must have clear, unambiguous description
- No redundant events (same failure in multiple places)
- No "miracles" (events that cannot physically occur)
- Consistent naming conventions throughout
从顶至底迭代构建故障树:
针对每个事件(从顶事件开始):
- 识别直接原因:“哪些事件会直接导致该故障?”
- 确定门类型:
- OR门:任意一个原因即可引发故障(独立原因)
- AND门:所有原因必须同时存在(冗余/防护机制失效)
- 分类事件类型:
- 中间事件(矩形):需要进一步展开分析
- 基本事件(圆形):部件故障,为终端节点
- 未展开事件(菱形):数据不足或超出分析范围
- 房屋事件(房屋符号):正常发生的事件,如开关启停
- 外部事件(房屋符号):环境或预期条件
- 持续展开,直到所有分支终止于基本事件/未展开事件
分支展开停止标准:
- 已到达部件级故障(基本事件)
- 超出分析范围(未展开事件)
- 正常预期条件(房屋事件)
- 可用信息不足
关键规则:
- 每个事件的描述必须清晰、无歧义
- 无重复事件(同一故障不能出现在多个位置)
- 无“不可能事件”(物理上无法发生的事件)
- 全程使用一致的命名规范
Phase 4: Qualitative Analysis
阶段4:定性分析
Identify Minimal Cut Sets (MCS):
Minimal cut sets are the smallest combinations of basic events that cause the top event.
- Order 1 MCS (single events): Most critical - single points of failure
- Order 2 MCS (pairs): Critical for redundant systems
- Higher order MCS: Less critical, require multiple failures
Analysis Tasks:
- List all minimal cut sets by order
- Identify single points of failure (Order 1)
- Assess common cause failure potential
- Evaluate effectiveness of redundancy
Run for automated MCS extraction.
python scripts/calculate_fta.py --qualitative识别最小割集(MCS):
最小割集是引发顶事件所需的最小基本事件组合。
- 一阶最小割集(单一事件):最关键,属于单点故障
- 二阶最小割集(成对事件):对冗余系统至关重要
- 高阶最小割集:关键性较低,需要多个故障同时发生
分析任务:
- 按阶次列出所有最小割集
- 识别单点故障(一阶割集)
- 评估共因故障的潜在可能性
- 评估冗余机制的有效性
运行 以自动提取最小割集。
python scripts/calculate_fta.py --qualitativePhase 5: Quantitative Analysis (Optional)
阶段5:定量分析(可选)
If failure probability data is available:
Collect failure data for each basic event:
- Failure rate (λ) or probability (P)
- Mission time or exposure period
- Data source (field data, handbook, estimate)
- Confidence level
Calculations:
- OR gate: P(output) ≈ P(A) + P(B) - P(A)×P(B) ≈ P(A) + P(B) for small probabilities
- AND gate: P(output) = P(A) × P(B) (for independent events)
Calculate:
- Probability of each minimal cut set
- Top event probability (sum of MCS probabilities with adjustments for overlapping events)
- Importance measures (Fussell-Vesely, Birnbaum)
Run with probability data.
python scripts/calculate_fta.py --quantitative若具备故障概率数据:
收集每个基本事件的故障数据:
- 故障率(λ)或故障概率(P)
- 任务时间或暴露时长
- 数据来源(现场数据、手册、估算值)
- 置信水平
计算方法:
- OR门:P(输出) ≈ P(A) + P(B) - P(A)×P(B),对于小概率事件可近似为P(A) + P(B)
- AND门:P(输出) = P(A) × P(B)(适用于独立事件)
计算内容:
- 每个最小割集的概率
- 顶事件概率(所有最小割集概率之和,需针对重叠事件进行调整)
- 重要度指标(Fussell-Vesely、Birnbaum)
结合概率数据运行 。
python scripts/calculate_fta.py --quantitativePhase 6: Common Cause Failure Analysis
阶段6:共因故障分析
Identify potential common causes across basic events:
- Environmental (temperature, humidity, EMI)
- Manufacturing (batch defects, supplier issues)
- Maintenance (common procedures, same personnel)
- Design (same components, shared software)
- Human error (operator mistakes, procedure gaps)
For AND gates (redundant systems):
Common cause failures can defeat redundancy. Apply beta-factor model if quantifying:
- P(CCF) = β × P(independent failure)
- Typical β values: 1-10% depending on diversity measures
识别基本事件间潜在的共因:
- 环境因素(温度、湿度、电磁干扰EMI)
- 制造因素(批次缺陷、供应商问题)
- 维护因素(通用流程、同一人员操作)
- 设计因素(相同部件、共享软件)
- 人为失误(操作人员错误、流程漏洞)
针对AND门(冗余系统):
共因故障可能会抵消冗余机制的作用。若需量化,可应用β因子模型:
- P(CCF) = β × P(独立故障)
- 典型β值:1-10%,具体取决于多样性措施
Phase 7: Documentation & Reporting
阶段7:文档编制与报告
Generate professional outputs:
- - SVG fault tree diagram
python scripts/generate_diagram.py - - Comprehensive HTML report
python scripts/generate_report.py
生成专业输出:
- - SVG格式故障树图
python scripts/generate_diagram.py - - 完整HTML报告
python scripts/generate_report.py
Symbols Reference
符号参考
| Symbol | Name | Description |
|---|---|---|
| Rectangle | Intermediate Event | Fault resulting from combination of inputs; requires gate |
| Circle | Basic Event | Component failure; terminal event with probability data |
| Diamond | Undeveloped Event | Not further developed (out of scope or insufficient data) |
| House | House Event | Expected occurrence; can be set TRUE/FALSE |
| Flat OR gate | OR Gate | Output if ANY input occurs |
| Flat AND gate | AND Gate | Output if ALL inputs occur |
| Triangle | Transfer | Connects to another tree section |
| 符号 | 名称 | 描述 |
|---|---|---|
| 矩形 | 中间事件 | 由多个输入组合引发的故障;需要通过门连接 |
| 圆形 | 基本事件 | 部件故障;带有概率数据的终端事件 |
| 菱形 | 未展开事件 | 未进一步展开(超出范围或数据不足) |
| 房屋形 | 房屋事件 | 预期发生的事件;可设置为TRUE/FALSE |
| 扁平OR门 | OR门 | 任意一个输入发生则输出故障 |
| 扁平AND门 | AND门 | 所有输入同时发生则输出故障 |
| 三角形 | 转移符号 | 连接至故障树的其他部分 |
Quality Scoring
质量评分
Each analysis scored on six dimensions (see references/quality-rubric.md):
| Dimension | Weight | Description |
|---|---|---|
| System Definition | 15% | Clear boundaries, assumptions, operating conditions |
| Top Event Clarity | 15% | Specific, unambiguous, appropriate level |
| Tree Completeness | 25% | All pathways developed, no gaps, consistent logic |
| Minimal Cut Sets | 20% | Correctly identified, analyzed for SPOFs |
| Quantification | 15% | Accurate calculations, appropriate data sources |
| Actionability | 10% | Identifies design improvements, risk mitigations |
Scoring Scale: Each dimension rated 1-5 (Inadequate to Excellent)
Overall Score: Weighted average × 20 = 0-100 points
Passing Threshold: 70 points minimum
Run to calculate scores.
python scripts/score_analysis.py每次分析从六个维度进行评分(详见 references/quality-rubric.md):
| 维度 | 权重 | 描述 |
|---|---|---|
| 系统定义 | 15% | 边界清晰、假设明确、运行条件明确 |
| 顶事件清晰度 | 15% | 具体、无歧义、层级合适 |
| 故障树完整性 | 25% | 所有路径均已展开、无遗漏、逻辑一致 |
| 最小割集分析 | 20% | 识别正确、已分析单点故障 |
| 量化分析 | 15% | 计算准确、数据来源合适 |
| 可行动性 | 10% | 识别出设计改进方向、风险缓解措施 |
评分标准:每个维度按1-5分评级(从不合格到优秀)
总分计算:加权平均分 × 20 = 0-100分
合格阈值:最低70分
运行 计算评分。
python scripts/score_analysis.pyCommon Pitfalls
常见误区
See references/common-pitfalls.md for:
- Incorrect gate selection (AND vs OR confusion)
- Top event too vague or at wrong level
- Missing common cause failures
- Incomplete branch development
- Ignoring human factors
- Double-counting events
详见 references/common-pitfalls.md:
- 门类型选择错误(混淆AND与OR门)
- 顶事件过于模糊或层级不当
- 遗漏共因故障
- 分支展开不完整
- 忽略人为因素
- 事件重复计数
Examples
示例
See references/examples.md for worked examples:
- Pump system failure
- Control system loss of function
- Safety interlock bypass
- Manufacturing equipment hazard
详见 references/examples.md 中的实操示例:
- 泵系统故障
- 控制系统功能丧失
- 安全联锁旁路
- 制造设备隐患
Integration with Other Tools
与其他工具的集成
- FMEA/FMECA: Bottom-up complements top-down FTA; use FMEA to identify basic events
- 5 Whys: Use for detailed investigation of specific failure pathways
- Fishbone Diagram: Brainstorm potential causes before structuring in FTA
- Reliability Block Diagram: Alternative view of system reliability
- Event Tree Analysis: Use FTA for initiating event probabilities
- FMEA/FMECA:自下而上的分析方法与自上而下的FTA形成互补;可利用FMEA识别基本事件
- 5 Whys:用于深入调查特定故障路径
- 鱼骨图:在构建FTA前用于头脑风暴潜在原因
- 可靠性框图(RBD):系统可靠性的另一种展示方式
- 事件树分析:利用FTA计算初始事件的概率
When to Use FTA
FTA的适用场景
Good candidates:
- Safety-critical system design review
- Accident/incident investigation
- Regulatory compliance demonstration
- Redundancy effectiveness evaluation
- System failure probability estimation
Consider alternatives when:
- Need to catalog ALL failure modes (use FMEA)
- Analyzing success paths (use Success Tree/RBD)
- Time-sequential dependencies critical (use Event Tree)
适用场景:
- 安全关键系统设计评审
- 事故/事件调查
- 合规性证明
- 冗余有效性评估
- 系统故障概率估算
考虑替代方法的场景:
- 需要梳理所有故障模式(使用FMEA)
- 分析成功路径(使用成功树/RBD)
- 时间序列依赖至关重要(使用事件树)