Saga Design Patterns: Choreography and Orchestration

Saga is a design pattern used to manage distributed transactions and ensure data consistency across microservices architectures. It decomposes long-running transactions into a series of smaller, manageable sub-transactions, each handled by a different service. There are two primary ways to implement the Saga pattern: Choreography (decentralized) and Orchestration (centralized).

1. What Is a Saga?

Saga: A sequence of local transactions. Each service in a saga performs its transaction and publishes an event or receives a command to activate the next step. If any step fails, the saga handles it via compensating (undo) actions to maintain data consistency.
Use case: Useful in e-commerce, booking systems, or any domain where multiple related services must coordinate a business workflow without distributed transactions.

2. Saga Choreography

2.1 Overview

Decentralized approach: Each service listens for certain events and performs its step in response. Once completed, it triggers the next step by publishing an event.
No central coordinator: Services react independently based on events.

2.2 How Choreography Works

Order Service creates an order and emits an OrderCreated event.
Payment Service listens for OrderCreated, processes the payment, and emits a PaymentProcessed event.
Inventory Service listens for PaymentProcessed, updates inventory, and so on.
If any step fails, a failure event is emitted, and downstream services can react with compensating transactions.

2.3 Failure handling in Choreography

In Choreography, each participating service implements its own failure detection and compensation logic:

When a service cannot complete its action (e.g., Payment fails due to insufficient funds), it emits a failure event (e.g., PaymentFailed).
Upstream services (earlier in the saga) listen for these failure events and execute their compensating transactions accordingly.
Compensation is often implemented as new local transactions that logically reverse the effects of the original transaction (e.g., refunding payment, restocking inventory, restoring customer credit).

Example Flow

Order Service emits OrderCreated.
Customer Service reserves credit, but if there’s insufficient credit, it emits CreditFailed:
- Order Service listens for CreditFailed and updates order status to Cancelled.
Inventory Service attempts reservation, but finds insufficient stock and emits InventoryFailed:
- Customer Service listens for InventoryFailed and triggers credit release.
- Order Service listens and cancels the order.
Payment Service fails to process payment (e.g., network issue, card declined), emits PaymentFailed:
- Inventory Service listens and releases reserved stock.
- Customer Service releases reserved credit.
- Order Service cancels the order.

Additional Approaches

Services may implement timeouts: if the next event isn’t received in a certain time, the service assumes failure and emits a failure event.
Idempotency must be ensured in compensating transactions to handle duplicate or out-of-order events.
Event replay mechanisms may be in place to recover from message delivery failures.

2.4 Benefits of Choreography

No single point of failure: Each service operates independently.
Simple for few services: Easy to implement for basic workflows.
Loose coupling: Services only know about events, not other services’ internals.

2.5 Drawbacks

Complexity grows with services: As more services participate, it’s harder to manage and trace workflow.
Harder to implement global policies: Timeouts, retries, and compensations must be handled by each service.
Difficult to track dependencies: Service interactions can become cyclic.

3. Saga Orchestration

3.1 Overview

Centralized approach: A single Saga Orchestrator is responsible for directing sagas’ steps by invoking services in the required order.
Central coordinator: The orchestrator manages the flow, keeping the logic centralized.

3.2 How Orchestration Works

Saga Orchestrator initiates the workflow (e.g., ordering an item).
The orchestrator tells the Order Service to create an order.
Upon success, the orchestrator tells the Payment Service to process payment.
The orchestrator then instructs the Inventory Service to reserve stock.
If a failure happens at any step, the orchestrator executes compensating transactions in reverse order to maintain consistency.

3.3 Failure Handling in Orchestration

Mechanics

In Orchestration, the orchestrator is the single point of coordination for compensations:

The orchestrator tracks the state of the saga and explicitly triggers compensating transactions in response to failures.
Compensation steps are defined in the orchestrator and executed in the reverse order of successful steps, ensuring only the necessary subset of actions is rolled back.

Example Flow

The orchestrator tells Order Service to create an order.
After successful order creation, instructs Customer Service to reserve credit. If Credit Service fails:
- Orchestrator immediately calls Order Service to cancel the order (compensation).
After credit is reserved, it calls Inventory Service to reserve items. If this fails:
- Orchestrator calls Customer Service to release credit.
- Orchestrator calls Order Service to cancel the order.
After inventory reservation, it calls Payment Service to process payment. If this fails:
- Orchestrator calls Inventory Service to release stock.
- Orchestrator calls Customer Service to release credit.
- Orchestrator calls Order Service to cancel the order.

Additional Approaches

The orchestrator typically persists saga state in a database, so ongoing or partially completed transactions can be resumed or compensated after a crash.
Compensation steps are often registered as part of the saga definition; e.g., “if inventory reservation fails, call this compensation step.”
Retry logic is usually managed centrally by the orchestrator, allowing for consistent error handling policies.
Timeouts and global transaction deadlines can be enforced by the orchestrator.

3.4 Benefits of Orchestration

Centralized workflow management: Easy to track and modify saga definitions.
Suited for complex flows: Better when many steps or conditions are involved.
Clear separation of concerns: Business logic is concentrated in the orchestrator, allowing services to focus on individual tasks.

3.4 Drawbacks

Single point of failure: Orchestrator becomes a critical dependency.
Added design complexity: The orchestrator must manage saga state and compensations.
Tighter coupling: The orchestrator must know about all service APIs.