resilience-patterns

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Resilience Patterns Skill

弹性模式实践技能

Overview

概述

This skill provides guidance on implementing resilience patterns in .NET applications. It covers both synchronous resilience (HTTP clients, service calls) using Polly and asynchronous resilience (message handlers) using Brighter.
Key Principle: Design for failure. Systems should gracefully handle transient faults, prevent cascade failures, and provide meaningful fallback behavior.
本技能文档为在.NET应用中实现弹性模式提供指导,涵盖了使用Polly实现同步场景下的弹性机制(如HTTP客户端、服务调用),以及使用Brighter实现异步场景下的弹性机制(如消息处理器)。 核心原则: 为故障而设计。系统应能优雅处理瞬时故障、防止级联故障,并提供有效的降级回退机制。

When to Use This Skill

适用场景

Keywords: resilience, circuit breaker, retry, polly, brighter, fault tolerance, transient failure, DLQ, dead letter queue, timeout, bulkhead, fallback, http client resilience
Use this skill when:
  • Implementing HTTP client resilience
  • Configuring retry policies for transient failures
  • Setting up circuit breakers to prevent cascade failures
  • Designing message handler error handling
  • Implementing dead letter queue patterns
  • Adding timeout policies to service calls
  • Configuring bulkhead isolation
关键词: 弹性机制、断路器、重试、Polly、Brighter、容错、瞬时故障、DLQ、死信队列、超时、舱壁隔离、回退、HTTP客户端弹性 在以下场景使用本技能:
  • 实现HTTP客户端弹性机制
  • 为瞬时故障配置重试策略
  • 搭建断路器以防止级联故障
  • 设计消息处理器的错误处理逻辑
  • 实现死信队列模式
  • 为服务调用添加超时策略
  • 配置舱壁隔离机制

Resilience Strategy Overview

弹性策略概述

Synchronous Resilience (Polly)

同步场景弹性机制(Polly)

For HTTP calls and synchronous service communication:
PatternPurposeWhen to Use
RetryRetry failed operationsTransient failures (network, 503, timeouts)
Circuit BreakerStop calling failing servicesRepeated failures indicate service is down
TimeoutBound operation timePrevent indefinite waits
BulkheadIsolate failuresPrevent one caller from exhausting resources
FallbackProvide alternativeGraceful degradation
适用于HTTP调用和同步服务通信场景:
模式用途适用场景
重试(Retry)重试失败的操作瞬时故障(如网络波动、503错误、超时)
断路器(Circuit Breaker)停止调用已故障的服务多次失败表明服务已下线
超时(Timeout)限制操作时长防止无限期等待
舱壁(Bulkhead)隔离故障防止单个调用方耗尽资源
回退(Fallback)提供替代方案实现优雅降级

Asynchronous Resilience (Brighter)

异步场景弹性机制(Brighter)

For message-based and async operations:
PatternPurposeWhen to Use
RetryRedeliver failed messagesTransient processing failures
Dead Letter QueuePark unprocessable messagesPoison messages, business rule failures
Circuit BreakerStop processing temporarilyDownstream service unavailable
TimeoutBound handler executionPrevent handler blocking
适用于基于消息的异步操作场景:
模式用途适用场景
重试(Retry)重新投递失败的消息瞬时处理故障
死信队列(Dead Letter Queue)存放无法处理的消息有毒消息、业务规则校验失败
断路器(Circuit Breaker)暂时停止处理下游服务不可用
超时(Timeout)限制处理器执行时长防止处理器阻塞

Quick Start: Polly v8 with HttpClient

快速入门:Polly v8 结合 HttpClient

Basic Setup

基础配置

csharp
// Program.cs or Startup.cs
builder.Services.AddHttpClient<IOrderService, OrderService>()
    .AddStandardResilienceHandler();
The
AddStandardResilienceHandler()
adds a preconfigured pipeline with:
  • Rate limiter
  • Total request timeout
  • Retry (exponential backoff)
  • Circuit breaker
  • Attempt timeout
csharp
// Program.cs 或 Startup.cs
builder.Services.AddHttpClient<IOrderService, OrderService>()
    .AddStandardResilienceHandler();
AddStandardResilienceHandler()
方法会添加一个预配置的弹性管道,包含以下组件:
  • 速率限制器
  • 请求总超时
  • 重试(指数退避)
  • 断路器
  • 单次尝试超时

Custom Configuration

自定义配置

csharp
builder.Services.AddHttpClient<IOrderService, OrderService>()
    .AddResilienceHandler("custom-pipeline", builder =>
    {
        // Retry with exponential backoff
        builder.AddRetry(new HttpRetryStrategyOptions
        {
            MaxRetryAttempts = 3,
            Delay = TimeSpan.FromSeconds(1),
            BackoffType = DelayBackoffType.Exponential,
            UseJitter = true,
            ShouldHandle = new PredicateBuilder<HttpResponseMessage>()
                .Handle<HttpRequestException>()
                .HandleResult(r => r.StatusCode == HttpStatusCode.ServiceUnavailable)
        });

        // Circuit breaker
        builder.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
        {
            FailureRatio = 0.5,
            MinimumThroughput = 10,
            SamplingDuration = TimeSpan.FromSeconds(30),
            BreakDuration = TimeSpan.FromSeconds(30)
        });

        // Timeout per attempt
        builder.AddTimeout(TimeSpan.FromSeconds(10));
    });
Detailed Polly patterns: See
references/polly-patterns.md
csharp
builder.Services.AddHttpClient<IOrderService, OrderService>()
    .AddResilienceHandler("custom-pipeline", builder =>
    {
        // 配置指数退避重试
        builder.AddRetry(new HttpRetryStrategyOptions
        {
            MaxRetryAttempts = 3,
            Delay = TimeSpan.FromSeconds(1),
            BackoffType = DelayBackoffType.Exponential,
            UseJitter = true,
            ShouldHandle = new PredicateBuilder<HttpResponseMessage>()
                .Handle<HttpRequestException>()
                .HandleResult(r => r.StatusCode == HttpStatusCode.ServiceUnavailable)
        });

        // 配置断路器
        builder.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
        {
            FailureRatio = 0.5,
            MinimumThroughput = 10,
            SamplingDuration = TimeSpan.FromSeconds(30),
            BreakDuration = TimeSpan.FromSeconds(30)
        });

        // 配置单次尝试超时
        builder.AddTimeout(TimeSpan.FromSeconds(10));
    });
Polly模式详情: 请参阅
references/polly-patterns.md

Quick Start: Brighter Message Handler

快速入门:Brighter 消息处理器

Basic Retry Policy

基础重试策略

csharp
public class OrderCreatedHandler : RequestHandler<OrderCreated>
{
    [UsePolicy("retry-policy", step: 1)]
    public override OrderCreated Handle(OrderCreated command)
    {
        // Process order
        return base.Handle(command);
    }
}
csharp
public class OrderCreatedHandler : RequestHandler<OrderCreated>
{
    [UsePolicy("retry-policy", step: 1)]
    public override OrderCreated Handle(OrderCreated command)
    {
        // 处理订单
        return base.Handle(command);
    }
}

Policy Registry Setup

策略注册中心配置

csharp
var policyRegistry = new PolicyRegistry
{
    {
        "retry-policy",
        Policy
            .Handle<Exception>()
            .WaitAndRetry(
                retryCount: 3,
                sleepDurationProvider: attempt =>
                    TimeSpan.FromSeconds(Math.Pow(2, attempt)))
    }
};

services.AddBrighter()
    .UseExternalBus(/* config */)
    .UsePolicyRegistry(policyRegistry);
Detailed Brighter patterns: See
references/brighter-resilience.md
csharp
var policyRegistry = new PolicyRegistry
{
    {
        "retry-policy",
        Policy
            .Handle<Exception>()
            .WaitAndRetry(
                retryCount: 3,
                sleepDurationProvider: attempt =>
                    TimeSpan.FromSeconds(Math.Pow(2, attempt)))
    }
};

services.AddBrighter()
    .UseExternalBus(/* 配置信息 */)
    .UsePolicyRegistry(policyRegistry);
Brighter弹性模式详情: 请参阅
references/brighter-resilience.md

Pattern Decision Tree

模式决策树

When to Use Retry

何时使用重试

Use retry when:
  • Failure is likely transient (network blip, temporary 503)
  • Operation is idempotent
  • Delay between retries is acceptable
Don't use retry when:
  • Failure is business logic (validation error, 400 Bad Request)
  • Operation is not idempotent (unless with idempotency key)
  • Immediate response required
使用重试的场景:
  • 故障为瞬时性(如网络波动、临时503错误)
  • 操作具有幂等性
  • 重试间隔可接受
不适合使用重试的场景:
  • 故障为业务逻辑错误(如参数校验失败、400 Bad Request)
  • 操作不具备幂等性(除非使用幂等键)
  • 需要即时响应

When to Use Circuit Breaker

何时使用断路器

Use circuit breaker when:
  • Calling external services that might be down
  • Need to fail fast instead of waiting
  • Want to prevent cascade failures
  • Service recovery needs time
Configuration guidance: See
references/circuit-breaker-config.md
使用断路器的场景:
  • 调用可能下线的外部服务
  • 需要快速失败而非等待
  • 希望防止级联故障
  • 服务恢复需要时间
配置指南: 请参阅
references/circuit-breaker-config.md

When to Use DLQ

何时使用死信队列(DLQ)

Use DLQ when:
  • Message cannot be processed after max retries
  • Business rule prevents processing
  • Manual intervention needed
  • Audit trail required for failures
DLQ patterns: See
references/dlq-patterns.md
使用DLQ的场景:
  • 消息经过最大重试次数后仍无法处理
  • 业务规则限制导致无法处理
  • 需要人工干预
  • 需要故障审计追踪
DLQ模式详情: 请参阅
references/dlq-patterns.md

Retry Strategy Patterns

重试策略模式

Immediate Retry

即时重试

For very transient failures:
csharp
.AddRetry(new RetryStrategyOptions
{
    MaxRetryAttempts = 2,
    Delay = TimeSpan.Zero  // Immediate retry
});
适用于极短暂的瞬时故障:
csharp
.AddRetry(new RetryStrategyOptions
{
    MaxRetryAttempts = 2,
    Delay = TimeSpan.Zero  // 即时重试
});

Exponential Backoff

指数退避重试

For transient failures that need time:
csharp
.AddRetry(new RetryStrategyOptions
{
    MaxRetryAttempts = 4,
    Delay = TimeSpan.FromSeconds(1),
    BackoffType = DelayBackoffType.Exponential,
    UseJitter = true  // Prevents thundering herd
});
Delays: 1s → 2s → 4s → 8s (with jitter)
适用于需要等待恢复的瞬时故障:
csharp
.AddRetry(new RetryStrategyOptions
{
    MaxRetryAttempts = 4,
    Delay = TimeSpan.FromSeconds(1),
    BackoffType = DelayBackoffType.Exponential,
    UseJitter = true  // 防止惊群效应
});
间隔时长: 1s → 2s → 4s → 8s(带随机抖动)

Linear Backoff

线性退避重试

For rate-limited services:
csharp
.AddRetry(new RetryStrategyOptions
{
    MaxRetryAttempts = 3,
    Delay = TimeSpan.FromSeconds(2),
    BackoffType = DelayBackoffType.Linear
});
Delays: 2s → 4s → 6s
Full retry strategies: See
references/retry-strategies.md
适用于有速率限制的服务:
csharp
.AddRetry(new RetryStrategyOptions
{
    MaxRetryAttempts = 3,
    Delay = TimeSpan.FromSeconds(2),
    BackoffType = DelayBackoffType.Linear
});
间隔时长: 2s → 4s → 6s
完整重试策略: 请参阅
references/retry-strategies.md

Circuit Breaker Configuration

断路器配置

Conservative (Sensitive Service)

保守模式(敏感服务)

csharp
.AddCircuitBreaker(new CircuitBreakerStrategyOptions
{
    FailureRatio = 0.25,        // Open after 25% failures
    MinimumThroughput = 5,       // Need at least 5 calls to evaluate
    SamplingDuration = TimeSpan.FromSeconds(10),
    BreakDuration = TimeSpan.FromSeconds(60)  // Stay open 60s
});
csharp
.AddCircuitBreaker(new CircuitBreakerStrategyOptions
{
    FailureRatio = 0.25,        // 25%失败率后打开断路器
    MinimumThroughput = 5,       // 至少需要5次调用才会评估
    SamplingDuration = TimeSpan.FromSeconds(10),
    BreakDuration = TimeSpan.FromSeconds(60)  // 保持打开状态60秒
});

Aggressive (High Availability)

激进模式(高可用服务)

csharp
.AddCircuitBreaker(new CircuitBreakerStrategyOptions
{
    FailureRatio = 0.5,          // Open after 50% failures
    MinimumThroughput = 20,      // Need 20 calls before evaluation
    SamplingDuration = TimeSpan.FromSeconds(30),
    BreakDuration = TimeSpan.FromSeconds(15)  // Quick recovery attempt
});
Detailed configuration: See
references/circuit-breaker-config.md
csharp
.AddCircuitBreaker(new CircuitBreakerStrategyOptions
{
    FailureRatio = 0.5,          // 50%失败率后打开断路器
    MinimumThroughput = 20,      // 至少需要20次调用才会评估
    SamplingDuration = TimeSpan.FromSeconds(30),
    BreakDuration = TimeSpan.FromSeconds(15)  // 快速尝试恢复
});
详细配置指南: 请参阅
references/circuit-breaker-config.md

Dead Letter Queue Pattern

死信队列模式

When Message Processing Fails

消息处理失败流程

text
1. Message received
2. Handler attempts processing
3. Failure occurs
4. Retry policy applied (1...N attempts)
5. All retries exhausted
6. Message moved to DLQ
7. Alert/monitoring triggered
8. Manual investigation
text
1. 接收消息
2. 处理器尝试处理
3. 发生故障
4. 应用重试策略(1...N次尝试)
5. 所有重试均失败
6. 消息移入DLQ
7. 触发告警/监控
8. 人工排查

Brighter DLQ Setup

Brighter DLQ配置

csharp
services.AddBrighter()
    .UseExternalBus(config =>
    {
        config.Publication.RequeueDelayInMs = 500;
        config.Publication.RequeueCount = 3;
        // After 3 requeues, message goes to DLQ
    });
Full DLQ patterns: See
references/dlq-patterns.md
csharp
services.AddBrighter()
    .UseExternalBus(config =>
    {
        config.Publication.RequeueDelayInMs = 500;
        config.Publication.RequeueCount = 3;
        // 3次重新投递后,消息移入DLQ
    });
完整DLQ模式: 请参阅
references/dlq-patterns.md

Combined Patterns

组合模式

HTTP Client with Full Resilience

具备完整弹性的HTTP客户端

csharp
builder.Services.AddHttpClient<IPaymentGateway, PaymentGateway>()
    .AddResilienceHandler("payment-gateway", builder =>
    {
        // Order matters: outer to inner

        // 1. Total timeout (outer boundary)
        builder.AddTimeout(TimeSpan.FromSeconds(30));

        // 2. Retry (with circuit breaker inside)
        builder.AddRetry(new HttpRetryStrategyOptions
        {
            MaxRetryAttempts = 3,
            Delay = TimeSpan.FromMilliseconds(500),
            BackoffType = DelayBackoffType.Exponential,
            UseJitter = true
        });

        // 3. Circuit breaker
        builder.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
        {
            FailureRatio = 0.5,
            MinimumThroughput = 10,
            BreakDuration = TimeSpan.FromSeconds(30)
        });

        // 4. Per-attempt timeout (inner)
        builder.AddTimeout(TimeSpan.FromSeconds(5));
    });
csharp
builder.Services.AddHttpClient<IPaymentGateway, PaymentGateway>()
    .AddResilienceHandler("payment-gateway", builder =>
    {
        // 顺序重要:从外层到内层

        // 1. 总超时(外层边界)
        builder.AddTimeout(TimeSpan.FromSeconds(30));

        // 2. 重试(内部包含断路器)
        builder.AddRetry(new HttpRetryStrategyOptions
        {
            MaxRetryAttempts = 3,
            Delay = TimeSpan.FromMilliseconds(500),
            BackoffType = DelayBackoffType.Exponential,
            UseJitter = true
        });

        // 3. 断路器
        builder.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
        {
            FailureRatio = 0.5,
            MinimumThroughput = 10,
            BreakDuration = TimeSpan.FromSeconds(30)
        });

        // 4. 单次尝试超时(内层)
        builder.AddTimeout(TimeSpan.FromSeconds(5));
    });

Message Handler with Fallback

带回退机制的消息处理器

csharp
public class ProcessPaymentHandler : RequestHandler<ProcessPayment>
{
    [UsePolicy("circuit-breaker", step: 1)]
    [UsePolicy("retry", step: 2)]
    [UsePolicy("fallback", step: 3)]
    public override ProcessPayment Handle(ProcessPayment command)
    {
        _paymentService.Process(command);
        return base.Handle(command);
    }
}
csharp
public class ProcessPaymentHandler : RequestHandler<ProcessPayment>
{
    [UsePolicy("circuit-breaker", step: 1)]
    [UsePolicy("retry", step: 2)]
    [UsePolicy("fallback", step: 3)]
    public override ProcessPayment Handle(ProcessPayment command)
    {
        _paymentService.Process(command);
        return base.Handle(command);
    }
}

Observability

可观测性

Polly Telemetry

Polly 遥测

csharp
services.AddResiliencePipeline("my-pipeline", builder =>
{
    builder.AddRetry(/* options */)
        .ConfigureTelemetry(LoggerFactory.Create(b => b.AddConsole()));
});
csharp
services.AddResiliencePipeline("my-pipeline", builder =>
{
    builder.AddRetry(/* 配置选项 */)
        .ConfigureTelemetry(LoggerFactory.Create(b => b.AddConsole()));
});

Key Metrics to Monitor

需监控的关键指标

MetricPurposeAlert Threshold
Retry countTrack transient failures> 3 per minute
Circuit stateTrack service healthState = Open
DLQ depthTrack processing failures> 0
Timeout rateTrack slow services> 5%
指标用途告警阈值
重试次数追踪瞬时故障每分钟超过3次
断路器状态追踪服务健康度状态为“打开”
DLQ深度追踪处理失败的消息数大于0
超时率追踪慢服务超过5%

Anti-Patterns

反模式

Over-Retrying

过度重试

Problem: Retrying too many times, too quickly.
csharp
// BAD: 10 immediate retries
.AddRetry(new RetryStrategyOptions { MaxRetryAttempts = 10 });
Fix: Use exponential backoff, limit retries:
csharp
// GOOD: 3 retries with backoff
.AddRetry(new RetryStrategyOptions
{
    MaxRetryAttempts = 3,
    Delay = TimeSpan.FromSeconds(1),
    BackoffType = DelayBackoffType.Exponential
});
问题: 重试次数过多、间隔过短。
csharp
// 错误示例:10次即时重试
.AddRetry(new RetryStrategyOptions { MaxRetryAttempts = 10 });
修复方案: 使用指数退避,限制重试次数:
csharp
// 正确示例:3次带退避的重试
.AddRetry(new RetryStrategyOptions
{
    MaxRetryAttempts = 3,
    Delay = TimeSpan.FromSeconds(1),
    BackoffType = DelayBackoffType.Exponential
});

Retrying Non-Transient Failures

重试非瞬时故障

Problem: Retrying business logic failures.
csharp
// BAD: Retrying 400 Bad Request
ShouldHandle = new PredicateBuilder<HttpResponseMessage>()
    .HandleResult(r => !r.IsSuccessStatusCode)
Fix: Only retry transient failures:
csharp
// GOOD: Only retry transient HTTP codes
ShouldHandle = new PredicateBuilder<HttpResponseMessage>()
    .Handle<HttpRequestException>()
    .HandleResult(r => r.StatusCode is
        HttpStatusCode.ServiceUnavailable or
        HttpStatusCode.GatewayTimeout or
        HttpStatusCode.RequestTimeout)
问题: 对业务逻辑错误进行重试。
csharp
// 错误示例:重试所有非成功响应
ShouldHandle = new PredicateBuilder<HttpResponseMessage>()
    .HandleResult(r => !r.IsSuccessStatusCode)
修复方案: 仅对瞬时故障进行重试:
csharp
// 正确示例:仅重试瞬时HTTP状态码
ShouldHandle = new PredicateBuilder<HttpResponseMessage>()
    .Handle<HttpRequestException>()
    .HandleResult(r => r.StatusCode is
        HttpStatusCode.ServiceUnavailable or
        HttpStatusCode.GatewayTimeout or
        HttpStatusCode.RequestTimeout)

Missing Circuit Breaker

缺失断路器

Problem: Retrying endlessly when service is down.
Fix: Always pair retry with circuit breaker for external calls.
问题: 服务下线时仍无限重试。
修复方案: 调用外部服务时,始终将重试与断路器配合使用。

DLQ as Black Hole

DLQ 作为“黑洞”

Problem: Messages go to DLQ and are never processed.
Fix:
  • Monitor DLQ depth
  • Set up alerts
  • Implement replay mechanism
  • Document investigation procedures
问题: 消息进入DLQ后无人处理。
修复方案:
  • 监控DLQ深度
  • 设置告警
  • 实现重放机制
  • 文档化排查流程

References

参考文档

  • references/polly-patterns.md
    - Comprehensive Polly v8 patterns
  • references/circuit-breaker-config.md
    - Circuit breaker configuration guide
  • references/retry-strategies.md
    - Retry strategy patterns
  • references/brighter-resilience.md
    - Brighter message handler resilience
  • references/dlq-patterns.md
    - Dead letter queue patterns
  • references/polly-patterns.md
    - Polly v8 完整模式指南
  • references/circuit-breaker-config.md
    - 断路器配置指南
  • references/retry-strategies.md
    - 重试策略模式
  • references/brighter-resilience.md
    - Brighter 消息处理器弹性机制
  • references/dlq-patterns.md
    - 死信队列模式

Related Skills

相关技能

  • fitness-functions
    - Test resilience with performance fitness functions
  • modular-architecture
    - Isolate resilience concerns by module
  • adr-management
    - Document resilience decisions

Last Updated: 2025-12-22
  • fitness-functions
    - 使用性能适配函数测试弹性
  • modular-architecture
    - 按模块隔离弹性关注点
  • adr-management
    - 记录弹性决策

最后更新: 2025-12-22

Version History

版本历史

  • v1.0.0 (2025-12-26): Initial release

  • v1.0.0 (2025-12-26): 初始版本