Context
Some operations need to write to two or more transactional systems, and require the end result to be a complete success or complete failure. The individual steps, or time between them, might take a while.
Prerequisites
Eventual consistency is acceptable.
It is worth introducing extra complexity.
Example
A travel booking system needs to book a hotel and a flight. It must end up either booking both, or neither.
Problem
How do we perform multiple operations such that either they all (eventually) either succeed or get rolled back?
Solution
Use a saga. Sagas are a sequence of transactions, and introduce the idea of backwards recovery to resumable operations and recovery points.
Model your operation as a set of states and transitions between those states. There will be a set of acceptable terminal states (including the initial state), and intermediate states. It must be possible to transition from any intermediate state to a terminal state. If forward progress is not possible, it should be possible to transition backwards to the initial state.
These backwards transitions are known as compensating actions, and are effectively a rollback mechanism.
For the travel booking example, we might have three states:
- Nothing booked (initial state, terminal state) -
∅
- Flight booked (intermediate state) -
FB
- Flight and hotel booked (terminal state) -
FB HB
And two forward operations: book flight
and book hotel
. If we end up in a state where we’ve booked the flight but cannot book the hotel (e.g. because it is full), then we need the backward operation: cancel flight
.
You will want some way to drive progress (either forwards or backwards). This can be a centralised system, such as a completer (known as orchestration), or distributed using e.g. transactional outboxes (known as choreography).
See also
- Microservice patterns: Saga
- [Video] What is a Saga in Microservices?
- Saga distributed transactions pattern
- Patterns for distributed transactions within a microservices architecture
Related
- ACID transaction – Perform multiple writes, such that either all of them or none of them succeed
- Completer – Complete unfinished operations, even if clients give up retrying
- Distributed transaction – Write to multiple systems transactionally
- Recovery point – Record current progress to allow recovery with minimal rework
- Resumable operation – Allow operations to continue from where the previous attempt failed
- Transactional outbox – Transactionally write a description of work to be performed asynchronously