Thom Wright

Designing for failure: Patterns

Reusable building blocks to help design reliable systems in the presence of failures.

See the introductory post.

API design

Rather than internal details, these patterns describe the API as seen by clients.

Idempotency key

Identify identical requests

Reject non-identical retries

Detect changes in request content between retries


Inform clients about the results of asynchronous operations

Writing to a single system

Patterns for writing to a single system. Most patterns assume this system is an ACID database. This is the simplest topology, and the easiest to work with. It’s worth trying to design systems like this where possible, to avoid the complexity that arises from trying to maintain consistency between multiple systems.

ACID transaction

Perform multiple writes, such that either all of them or none of them succeed

Atomic read-then-write

Concurrently write data based on current state

Idempotency key (external)

Send a request to an external system at-least-once with only a single side effect

Change record

Record that a change has been made so it doesn’t happen again

Response record

Return the same response for every retry

Writing to multiple systems

When writing to a single ACID database, we get atomicity and consistency built in. Things get more complicated when writing to multiple systems where we don’t have these guarantees: we might not be able to perform all writes atomically, and so can end up in an inconsistent state.

Transactional outbox

Transactionally write a description of work to be performed asynchronously


Perform a series of transactions with backwards recovery

Distributed transaction

Write to multiple systems transactionally

Resumable operation

Allow operations to continue from where the previous attempt failed

Recovery point

Record current progress to allow recovery with minimal rework

At-most-once guard

Write to a system at most once

Idempotency key lock

Protect against concurrent retries


Prevent dangling references

Background processes

Sometimes inconsistency is unavoidable, whether by design, or simply because of a buggy implementation. Background processes can identify these inconsistencies and handle them in various ways.


Complete unfinished operations, even if clients give up retrying

Garbage collection

Find and delete unused data


Detect and resolve inconsistencies


Some patterns exist which should be avoided. They may seem to offer benefits, but either do not deliver what they seem to or have other serious drawbacks.

I/O inside transaction

Wrap a transaction around non-database I/O


When consistency is important, you will generally need to choose (at least) one of the patterns in the table below.

  Number of systems Synchronicity Atomicity Consistency Complexity
ACID transaction One Sync Atomic Consistent* Simple
Distributed transaction Many Sync + Async** Atomic Eventual Complex
Completer Many Sync + Async** Non-atomic Eventual Moderate
Transactional outbox Many Async Non-atomic Eventual Moderate
Saga Many Async Non-atomic Eventual Complex

* Depends on the isolation level used.

** Attempts to do all work synchronously, but will continue asynchronously in the case of failure.

More patterns