spring-boot-saga-pattern
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSpring Boot Saga Pattern
Spring Boot Saga模式
When to Use
适用场景
Implement this skill when:
- Building distributed transactions across multiple microservices
- Needing to replace two-phase commit (2PC) with a more scalable solution
- Handling transaction rollback when a service fails in multi-service workflows
- Ensuring eventual consistency in microservices architecture
- Implementing compensating transactions for failed operations
- Coordinating complex business processes spanning multiple services
- Choosing between choreography-based and orchestration-based saga approaches
Trigger phrases: distributed transactions, saga pattern, compensating transactions, microservices transaction, eventual consistency, rollback across services, orchestration pattern, choreography pattern
在以下场景中使用该方案:
- 构建跨多个微服务的分布式事务
- 需要用更具扩展性的方案替代两阶段提交(2PC)
- 处理多服务工作流中某个服务失败时的事务回滚
- 确保微服务架构中的最终一致性
- 为失败的操作实现补偿事务
- 协调跨多个服务的复杂业务流程
- 在基于编舞和基于编排的Saga方案中做选择
触发关键词:分布式事务、saga模式、补偿事务、微服务事务、最终一致性、跨服务回滚、编排模式、编舞模式
Overview
概述
The Saga Pattern is an architectural pattern for managing distributed transactions in microservices. Instead of using a single ACID transaction across multiple databases, a saga breaks the transaction into a sequence of local transactions. Each local transaction updates its database and publishes an event or message to trigger the next step. If a step fails, the saga executes compensating transactions to undo the changes made by previous steps.
Saga模式是一种用于管理微服务中分布式事务的架构模式。它不再跨多个数据库使用单一的ACID事务,而是将事务拆分为一系列本地事务。每个本地事务更新自身数据库,并发布事件或消息以触发下一步操作。如果某一步失败,Saga会执行补偿事务来撤销之前步骤所做的更改。
Key Architectural Decisions
核心架构决策
When implementing a saga, make these decisions:
- Approach Selection: Choose between choreography-based (event-driven, decoupled) or orchestration-based (centralized control, easier to track)
- Messaging Platform: Select Kafka, RabbitMQ, or Spring Cloud Stream
- Framework: Use Axon Framework, Eventuate Tram, Camunda, or Apache Camel
- State Persistence: Store saga state in database for recovery and debugging
- Idempotency: Ensure all operations (especially compensations) are idempotent and retryable
实现Saga时,需要做出以下决策:
- 方案选择:选择基于编舞(事件驱动、解耦)或基于编排(集中控制、易于追踪)的方案
- 消息平台:选择Kafka、RabbitMQ或Spring Cloud Stream
- 框架:使用Axon Framework、Eventuate Tram、Camunda或Apache Camel
- 状态持久化:将Saga状态存储在数据库中,用于恢复和调试
- 幂等性:确保所有操作(尤其是补偿操作)具备幂等性和可重试性
Two Approaches to Implement Saga
两种Saga实现方案
Choreography-Based Saga
基于编舞的Saga
Each microservice publishes events and listens to events from other services. No central coordinator.
Best for: Greenfield microservice applications with few participants
Advantages:
- Simple for small number of services
- Loose coupling between services
- No single point of failure
Disadvantages:
- Difficult to track workflow state
- Hard to troubleshoot and maintain
- Complexity grows with number of services
每个微服务发布事件并监听其他服务的事件。无中央协调器。
最适用于:参与者较少的全新微服务应用
优势:
- 服务数量较少时实现简单
- 服务间松耦合
- 无单点故障
劣势:
- 难以追踪工作流状态
- 故障排查和维护难度大
- 复杂度随服务数量增加而上升
Orchestration-Based Saga
基于编排的Saga
A central orchestrator manages the entire transaction flow and tells services what to do.
Best for: Brownfield applications, complex workflows, or when centralized control is needed
Advantages:
- Centralized visibility and monitoring
- Easier to troubleshoot and maintain
- Clear transaction flow
- Simplified error handling
- Better for complex workflows
Disadvantages:
- Orchestrator can become single point of failure
- Additional infrastructure component
由中央编排器管理整个事务流程,并告知各个服务执行操作。
最适用于:遗留系统改造、复杂工作流,或需要集中控制的场景
优势:
- 集中化的可见性与监控
- 故障排查和维护更简单
- 事务流程清晰
- 错误处理更简化
- 更适合复杂工作流
劣势:
- 编排器可能成为单点故障
- 需要额外的基础设施组件
Implementation Steps
实现步骤
Step 1: Define Transaction Flow
步骤1:定义事务流程
Identify the sequence of operations and corresponding compensating transactions:
Order → Payment → Inventory → Shipment → Notification
↓ ↓ ↓ ↓ ↓
Cancel Refund Release Cancel Cancel确定操作序列及对应的补偿事务:
订单 → 支付 → 库存 → 发货 → 通知
↓ ↓ ↓ ↓ ↓
取消订单 退款 释放库存 取消发货 取消通知Step 2: Choose Implementation Approach
步骤2:选择实现方案
- Choreography: Spring Cloud Stream with Kafka or RabbitMQ
- Orchestration: Axon Framework, Eventuate Tram, Camunda, or Apache Camel
- 编舞:结合Spring Cloud Stream与Kafka或RabbitMQ
- 编排:使用Axon Framework、Eventuate Tram、Camunda或Apache Camel
Step 3: Implement Services with Local Transactions
步骤3:实现带有本地事务的服务
Each service handles its local ACID transaction and publishes events or responds to commands.
每个服务处理自身的ACID本地事务,并发布事件或响应命令。
Step 4: Implement Compensating Transactions
步骤4:实现补偿事务
Every forward transaction must have a corresponding compensating transaction. Ensure idempotency and retryability.
每个正向事务必须对应一个补偿事务。确保幂等性和可重试性。
Step 5: Handle Failure Scenarios
步骤5:处理故障场景
Implement retry logic, timeouts, and dead-letter queues for failed messages.
为失败的消息实现重试逻辑、超时机制和死信队列。
Best Practices
最佳实践
Design Principles
设计原则
- Idempotency: Ensure compensating transactions execute safely multiple times
- Retryability: Design operations to handle retries without side effects
- Atomicity: Each local transaction must be atomic within its service
- Isolation: Handle concurrent saga executions properly
- Eventual Consistency: Accept that data becomes consistent over time
- 幂等性:确保补偿事务可以安全地多次执行
- 可重试性:设计操作时需考虑无副作用的重试
- 原子性:每个本地事务在其服务内部必须具备原子性
- 隔离性:正确处理并发的Saga执行
- 最终一致性:接受数据会随时间逐渐达到一致的状态
Service Design
服务设计
- Use constructor injection exclusively (never field injection)
- Implement services as stateless components
- Store saga state in persistent store (database or event store)
- Use immutable DTOs (Java records preferred)
- Separate domain logic from infrastructure concerns
- 仅使用构造函数注入(绝不使用字段注入)
- 将服务实现为无状态组件
- 将Saga状态存储在持久化存储中(数据库或事件存储)
- 使用不可变DTO(推荐使用Java records)
- 将领域逻辑与基础设施关注点分离
Error Handling
错误处理
- Implement circuit breakers for service calls
- Use dead-letter queues for failed messages
- Log all saga events for debugging and monitoring
- Implement timeout mechanisms for long-running sagas
- Design semantic locks to prevent concurrent updates
- 为服务调用实现断路器
- 为失败的消息使用死信队列
- 记录所有Saga事件以用于调试和监控
- 为长时运行的Saga实现超时机制
- 设计语义锁以防止并发更新
Testing
测试
- Test happy path scenarios
- Test each failure scenario and its compensation
- Test concurrent saga executions
- Test idempotency of compensating transactions
- Use Testcontainers for integration testing
- 测试正常路径场景
- 测试每个故障场景及其补偿逻辑
- 测试并发的Saga执行
- 测试补偿事务的幂等性
- 使用Testcontainers进行集成测试
Monitoring and Observability
监控与可观测性
- Track saga execution status and duration
- Monitor compensation transaction execution
- Alert on stuck or failed sagas
- Use distributed tracing (Spring Cloud Sleuth, Zipkin)
- Implement health checks for saga coordinators
- 追踪Saga的执行状态和时长
- 监控补偿事务的执行情况
- 对停滞或失败的Saga发出告警
- 使用分布式追踪(Spring Cloud Sleuth、Zipkin)
- 为Saga协调器实现健康检查
Technology Stack
技术栈
Spring Boot 3.x with dependencies:
Messaging: Spring Cloud Stream, Apache Kafka, RabbitMQ, Spring AMQP
Saga Frameworks: Axon Framework (4.9.0), Eventuate Tram Sagas, Camunda, Apache Camel
Persistence: Spring Data JPA, Event Sourcing (optional), Transactional Outbox Pattern
Monitoring: Spring Boot Actuator, Micrometer, Distributed Tracing (Sleuth + Zipkin)
Spring Boot 3.x及相关依赖:
消息组件:Spring Cloud Stream、Apache Kafka、RabbitMQ、Spring AMQP
Saga框架:Axon Framework(4.9.0)、Eventuate Tram Sagas、Camunda、Apache Camel
持久化:Spring Data JPA、事件溯源(可选)、事务性发件箱模式
监控:Spring Boot Actuator、Micrometer、分布式追踪(Sleuth + Zipkin)
Anti-Patterns to Avoid
需避免的反模式
❌ Tight Coupling: Services directly calling each other instead of using events
❌ Missing Compensations: Not implementing compensating transactions for every step
❌ Non-Idempotent Operations: Compensations that cannot be safely retried
❌ Synchronous Sagas: Waiting synchronously for each step (defeats the purpose)
❌ Lost Messages: Not handling message delivery failures
❌ No Monitoring: Running sagas without visibility into their status
❌ Shared Database: Using same database across multiple services
❌ Ignoring Network Failures: Not handling partial failures gracefully
❌ 紧耦合:服务直接调用而非使用事件通信
❌ 缺失补偿逻辑:不为每个步骤实现补偿事务
❌ 非幂等操作:补偿操作无法安全重试
❌ 同步Saga:同步等待每个步骤(违背设计初衷)
❌ 消息丢失:未处理消息投递失败的情况
❌ 无监控:在无状态可见性的情况下运行Saga
❌ 共享数据库:多个服务使用同一数据库
❌ 忽略网络故障:未优雅处理部分故障
When NOT to Use Saga Pattern
不适用Saga模式的场景
Do not implement this pattern when:
- Single service transactions (use local ACID transactions instead)
- Strong consistency is required (consider monolith or shared database)
- Simple CRUD operations without cross-service dependencies
- Low transaction volume with simple flows
- Team lacks experience with distributed systems
在以下场景中请勿使用该模式:
- 单服务事务(改用本地ACID事务)
- 需要强一致性的场景(考虑单体应用或共享数据库)
- 无跨服务依赖的简单CRUD操作
- 低事务量且流程简单的场景
- 团队缺乏分布式系统经验
References
参考资料
For detailed information, consult the following resources:
- Saga Pattern Definition
- Choreography-Based Implementation
- Orchestration-Based Implementation
- Event-Driven Architecture
- Compensating Transactions
- State Management
- Error Handling and Retry
- Testing Strategies
- Common Pitfalls and Solutions
See also examples.md for complete implementation examples:
- E-Commerce Order Processing (orchestration with Axon Framework)
- Food Delivery Application (choreography with Kafka and Spring Cloud Stream)
- Travel Booking System (complex orchestration with multiple compensations)
- Banking Transfer System
- Real-world microservices patterns
如需详细信息,请查阅以下资源:
- Saga模式定义
- 基于编舞的实现
- 基于编排的实现
- 事件驱动架构
- 补偿事务
- 状态管理
- 错误处理与重试
- 测试策略
- 常见陷阱与解决方案
另可查看examples.md获取完整实现示例:
- 电商订单处理(基于Axon Framework的编排方案)
- 外卖配送应用(基于Kafka和Spring Cloud Stream的编舞方案)
- 旅游预订系统(含多补偿逻辑的复杂编排)
- 银行转账系统
- 真实世界的微服务模式