Thom's Blog

Saga

Failure pattern – Perform a series of transactions with backwards recovery

Context

Some operations need to write to two or more transactional systems, and require the end result to be a complete success or complete failure. The individual steps, or time between them, might take a while.

Prerequisites

Eventual consistency is acceptable.

It is worth introducing extra complexity.

Example

A travel booking system needs to book a hotel and a flight. It must end up either booking both, or neither.

Problem

How do we perform multiple operations such that either they all (eventually) either succeed or get rolled back?

Solution

Use a saga. Sagas are a sequence of transactions, and introduce the idea of backwards recovery to resumable operations and recovery points.

Model your operation as a set of states and transitions between those states. There will be a set of acceptable terminal states (including the initial state), and intermediate states. It must be possible to transition from any intermediate state to a terminal state. If forward progress is not possible, it should be possible to transition backwards to the initial state.

These backwards transitions are known as compensating actions, and are effectively a rollback mechanism.

For the travel booking example, we might have three states:

  1. Nothing booked (initial state, terminal state) -
  2. Flight booked (intermediate state) - FB
  3. Flight and hotel booked (terminal state) - FB HB

And two forward operations: book flight and book hotel. If we end up in a state where we’ve booked the flight but cannot book the hotel (e.g. because it is full), then we need the backward operation: cancel flight.

State diagram for the travel booking example

A saga with three states, two forward transitions and a backwards transition (compensating action). Acceptable terminal states in green.

You will want some way to drive progress (either forwards or backwards). This can be a centralised system, such as a completer (known as orchestration), or distributed using e.g. transactional outboxes (known as choreography).

See also

  • ACID transaction – Perform multiple writes, such that either all of them or none of them succeed
  • Completer – Complete unfinished operations, even if clients give up retrying
  • Distributed transaction – Write to multiple systems transactionally
  • Recovery point – Record current progress to allow recovery with minimal rework
  • Resumable operation – Allow operations to continue from where the previous attempt failed
  • Transactional outbox – Transactionally write a description of work to be performed asynchronously