spring-boot-saga-pattern

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Spring Boot Saga Pattern

Spring Boot Saga模式

When to Use

适用场景

Implement this skill when:
  • Building distributed transactions across multiple microservices
  • Needing to replace two-phase commit (2PC) with a more scalable solution
  • Handling transaction rollback when a service fails in multi-service workflows
  • Ensuring eventual consistency in microservices architecture
  • Implementing compensating transactions for failed operations
  • Coordinating complex business processes spanning multiple services
  • Choosing between choreography-based and orchestration-based saga approaches
Trigger phrases: distributed transactions, saga pattern, compensating transactions, microservices transaction, eventual consistency, rollback across services, orchestration pattern, choreography pattern
在以下场景中使用该方案:
  • 构建跨多个微服务的分布式事务
  • 需要用更具扩展性的方案替代两阶段提交(2PC)
  • 处理多服务工作流中某个服务失败时的事务回滚
  • 确保微服务架构中的最终一致性
  • 为失败的操作实现补偿事务
  • 协调跨多个服务的复杂业务流程
  • 在基于编舞和基于编排的Saga方案中做选择
触发关键词:分布式事务、saga模式、补偿事务、微服务事务、最终一致性、跨服务回滚、编排模式、编舞模式

Overview

概述

The Saga Pattern is an architectural pattern for managing distributed transactions in microservices. Instead of using a single ACID transaction across multiple databases, a saga breaks the transaction into a sequence of local transactions. Each local transaction updates its database and publishes an event or message to trigger the next step. If a step fails, the saga executes compensating transactions to undo the changes made by previous steps.
Saga模式是一种用于管理微服务中分布式事务的架构模式。它不再跨多个数据库使用单一的ACID事务,而是将事务拆分为一系列本地事务。每个本地事务更新自身数据库,并发布事件或消息以触发下一步操作。如果某一步失败,Saga会执行补偿事务来撤销之前步骤所做的更改。

Key Architectural Decisions

核心架构决策

When implementing a saga, make these decisions:
  1. Approach Selection: Choose between choreography-based (event-driven, decoupled) or orchestration-based (centralized control, easier to track)
  2. Messaging Platform: Select Kafka, RabbitMQ, or Spring Cloud Stream
  3. Framework: Use Axon Framework, Eventuate Tram, Camunda, or Apache Camel
  4. State Persistence: Store saga state in database for recovery and debugging
  5. Idempotency: Ensure all operations (especially compensations) are idempotent and retryable
实现Saga时,需要做出以下决策:
  1. 方案选择:选择基于编舞(事件驱动、解耦)或基于编排(集中控制、易于追踪)的方案
  2. 消息平台:选择Kafka、RabbitMQ或Spring Cloud Stream
  3. 框架:使用Axon Framework、Eventuate Tram、Camunda或Apache Camel
  4. 状态持久化:将Saga状态存储在数据库中,用于恢复和调试
  5. 幂等性:确保所有操作(尤其是补偿操作)具备幂等性和可重试性

Two Approaches to Implement Saga

两种Saga实现方案

Choreography-Based Saga

基于编舞的Saga

Each microservice publishes events and listens to events from other services. No central coordinator.
Best for: Greenfield microservice applications with few participants
Advantages:
  • Simple for small number of services
  • Loose coupling between services
  • No single point of failure
Disadvantages:
  • Difficult to track workflow state
  • Hard to troubleshoot and maintain
  • Complexity grows with number of services
每个微服务发布事件并监听其他服务的事件。无中央协调器
最适用于:参与者较少的全新微服务应用
优势
  • 服务数量较少时实现简单
  • 服务间松耦合
  • 无单点故障
劣势
  • 难以追踪工作流状态
  • 故障排查和维护难度大
  • 复杂度随服务数量增加而上升

Orchestration-Based Saga

基于编排的Saga

A central orchestrator manages the entire transaction flow and tells services what to do.
Best for: Brownfield applications, complex workflows, or when centralized control is needed
Advantages:
  • Centralized visibility and monitoring
  • Easier to troubleshoot and maintain
  • Clear transaction flow
  • Simplified error handling
  • Better for complex workflows
Disadvantages:
  • Orchestrator can become single point of failure
  • Additional infrastructure component
中央编排器管理整个事务流程,并告知各个服务执行操作。
最适用于:遗留系统改造、复杂工作流,或需要集中控制的场景
优势
  • 集中化的可见性与监控
  • 故障排查和维护更简单
  • 事务流程清晰
  • 错误处理更简化
  • 更适合复杂工作流
劣势
  • 编排器可能成为单点故障
  • 需要额外的基础设施组件

Implementation Steps

实现步骤

Step 1: Define Transaction Flow

步骤1:定义事务流程

Identify the sequence of operations and corresponding compensating transactions:
Order → Payment → Inventory → Shipment → Notification
   ↓         ↓         ↓          ↓           ↓
Cancel    Refund    Release    Cancel      Cancel
确定操作序列及对应的补偿事务:
订单 → 支付 → 库存 → 发货 → 通知
   ↓         ↓         ↓          ↓           ↓
取消订单    退款    释放库存    取消发货      取消通知

Step 2: Choose Implementation Approach

步骤2:选择实现方案

  • Choreography: Spring Cloud Stream with Kafka or RabbitMQ
  • Orchestration: Axon Framework, Eventuate Tram, Camunda, or Apache Camel
  • 编舞:结合Spring Cloud Stream与Kafka或RabbitMQ
  • 编排:使用Axon Framework、Eventuate Tram、Camunda或Apache Camel

Step 3: Implement Services with Local Transactions

步骤3:实现带有本地事务的服务

Each service handles its local ACID transaction and publishes events or responds to commands.
每个服务处理自身的ACID本地事务,并发布事件或响应命令。

Step 4: Implement Compensating Transactions

步骤4:实现补偿事务

Every forward transaction must have a corresponding compensating transaction. Ensure idempotency and retryability.
每个正向事务必须对应一个补偿事务。确保幂等性可重试性

Step 5: Handle Failure Scenarios

步骤5:处理故障场景

Implement retry logic, timeouts, and dead-letter queues for failed messages.
为失败的消息实现重试逻辑、超时机制和死信队列。

Best Practices

最佳实践

Design Principles

设计原则

  1. Idempotency: Ensure compensating transactions execute safely multiple times
  2. Retryability: Design operations to handle retries without side effects
  3. Atomicity: Each local transaction must be atomic within its service
  4. Isolation: Handle concurrent saga executions properly
  5. Eventual Consistency: Accept that data becomes consistent over time
  1. 幂等性:确保补偿事务可以安全地多次执行
  2. 可重试性:设计操作时需考虑无副作用的重试
  3. 原子性:每个本地事务在其服务内部必须具备原子性
  4. 隔离性:正确处理并发的Saga执行
  5. 最终一致性:接受数据会随时间逐渐达到一致的状态

Service Design

服务设计

  • Use constructor injection exclusively (never field injection)
  • Implement services as stateless components
  • Store saga state in persistent store (database or event store)
  • Use immutable DTOs (Java records preferred)
  • Separate domain logic from infrastructure concerns
  • 仅使用构造函数注入(绝不使用字段注入)
  • 将服务实现为无状态组件
  • 将Saga状态存储在持久化存储中(数据库或事件存储)
  • 使用不可变DTO(推荐使用Java records)
  • 将领域逻辑与基础设施关注点分离

Error Handling

错误处理

  • Implement circuit breakers for service calls
  • Use dead-letter queues for failed messages
  • Log all saga events for debugging and monitoring
  • Implement timeout mechanisms for long-running sagas
  • Design semantic locks to prevent concurrent updates
  • 为服务调用实现断路器
  • 为失败的消息使用死信队列
  • 记录所有Saga事件以用于调试和监控
  • 为长时运行的Saga实现超时机制
  • 设计语义锁以防止并发更新

Testing

测试

  • Test happy path scenarios
  • Test each failure scenario and its compensation
  • Test concurrent saga executions
  • Test idempotency of compensating transactions
  • Use Testcontainers for integration testing
  • 测试正常路径场景
  • 测试每个故障场景及其补偿逻辑
  • 测试并发的Saga执行
  • 测试补偿事务的幂等性
  • 使用Testcontainers进行集成测试

Monitoring and Observability

监控与可观测性

  • Track saga execution status and duration
  • Monitor compensation transaction execution
  • Alert on stuck or failed sagas
  • Use distributed tracing (Spring Cloud Sleuth, Zipkin)
  • Implement health checks for saga coordinators
  • 追踪Saga的执行状态和时长
  • 监控补偿事务的执行情况
  • 对停滞或失败的Saga发出告警
  • 使用分布式追踪(Spring Cloud Sleuth、Zipkin)
  • 为Saga协调器实现健康检查

Technology Stack

技术栈

Spring Boot 3.x with dependencies:
Messaging: Spring Cloud Stream, Apache Kafka, RabbitMQ, Spring AMQP
Saga Frameworks: Axon Framework (4.9.0), Eventuate Tram Sagas, Camunda, Apache Camel
Persistence: Spring Data JPA, Event Sourcing (optional), Transactional Outbox Pattern
Monitoring: Spring Boot Actuator, Micrometer, Distributed Tracing (Sleuth + Zipkin)
Spring Boot 3.x及相关依赖:
消息组件:Spring Cloud Stream、Apache Kafka、RabbitMQ、Spring AMQP
Saga框架:Axon Framework(4.9.0)、Eventuate Tram Sagas、Camunda、Apache Camel
持久化:Spring Data JPA、事件溯源(可选)、事务性发件箱模式
监控:Spring Boot Actuator、Micrometer、分布式追踪(Sleuth + Zipkin)

Anti-Patterns to Avoid

需避免的反模式

Tight Coupling: Services directly calling each other instead of using events ❌ Missing Compensations: Not implementing compensating transactions for every step ❌ Non-Idempotent Operations: Compensations that cannot be safely retried ❌ Synchronous Sagas: Waiting synchronously for each step (defeats the purpose) ❌ Lost Messages: Not handling message delivery failures ❌ No Monitoring: Running sagas without visibility into their status ❌ Shared Database: Using same database across multiple services ❌ Ignoring Network Failures: Not handling partial failures gracefully
紧耦合:服务直接调用而非使用事件通信 ❌ 缺失补偿逻辑:不为每个步骤实现补偿事务 ❌ 非幂等操作:补偿操作无法安全重试 ❌ 同步Saga:同步等待每个步骤(违背设计初衷) ❌ 消息丢失:未处理消息投递失败的情况 ❌ 无监控:在无状态可见性的情况下运行Saga ❌ 共享数据库:多个服务使用同一数据库 ❌ 忽略网络故障:未优雅处理部分故障

When NOT to Use Saga Pattern

不适用Saga模式的场景

Do not implement this pattern when:
  • Single service transactions (use local ACID transactions instead)
  • Strong consistency is required (consider monolith or shared database)
  • Simple CRUD operations without cross-service dependencies
  • Low transaction volume with simple flows
  • Team lacks experience with distributed systems
在以下场景中请勿使用该模式:
  • 单服务事务(改用本地ACID事务)
  • 需要强一致性的场景(考虑单体应用或共享数据库)
  • 无跨服务依赖的简单CRUD操作
  • 低事务量且流程简单的场景
  • 团队缺乏分布式系统经验

References

参考资料

For detailed information, consult the following resources:
  • Saga Pattern Definition
  • Choreography-Based Implementation
  • Orchestration-Based Implementation
  • Event-Driven Architecture
  • Compensating Transactions
  • State Management
  • Error Handling and Retry
  • Testing Strategies
  • Common Pitfalls and Solutions
See also examples.md for complete implementation examples:
  • E-Commerce Order Processing (orchestration with Axon Framework)
  • Food Delivery Application (choreography with Kafka and Spring Cloud Stream)
  • Travel Booking System (complex orchestration with multiple compensations)
  • Banking Transfer System
  • Real-world microservices patterns
如需详细信息,请查阅以下资源:
  • Saga模式定义
  • 基于编舞的实现
  • 基于编排的实现
  • 事件驱动架构
  • 补偿事务
  • 状态管理
  • 错误处理与重试
  • 测试策略
  • 常见陷阱与解决方案
另可查看examples.md获取完整实现示例:
  • 电商订单处理(基于Axon Framework的编排方案)
  • 外卖配送应用(基于Kafka和Spring Cloud Stream的编舞方案)
  • 旅游预订系统(含多补偿逻辑的复杂编排)
  • 银行转账系统
  • 真实世界的微服务模式