Thom's Blog

Designing for failure: Patterns

Reusable building blocks to help design reliable systems in the presence of failures.

See the introductory post.

API design

Rather than internal details, these patterns describe the API as seen by clients.

Idempotency key

Identify identical requests

Reject non-identical retries

Detect changes in request content between retries

Callback

Inform clients about the results of asynchronous operations

Writing to a single system

Patterns for writing to a single system. Most patterns assume this system is an ACID database. This is the simplest topology, and the easiest to work with. It’s worth trying to design systems like this where possible, to avoid the complexity that arises from trying to maintain consistency between multiple systems.

ACID transaction

Perform multiple writes, such that either all of them or none of them succeed

Atomic read-then-write

Concurrently write data based on current state

Idempotency key (external)

Send a request to an external system at-least-once with only a single side effect

Change record

Record that a change has been made so it doesn’t happen again

Response record

Return the same response for every retry

Writing to multiple systems

When writing to a single ACID database, we get atomicity and consistency built in. Things get more complicated when writing to multiple systems where we don’t have these guarantees: we might not be able to perform all writes atomically, and so can end up in an inconsistent state.

Transactional outbox

Transactionally write a description of work to be performed asynchronously

Saga

Perform a series of transactions with backwards recovery

Distributed transaction

Write to multiple systems transactionally

Resumable operation

Allow operations to continue from where the previous attempt failed

Recovery point

Record current progress to allow recovery with minimal rework

Reliable retries

Reliably keep retrying until success

At-most-once guard

Write to a system at most once

Idempotency key lock

Protect against concurrent retries

Store-then-reference

Prevent dangling references

Background processes

Sometimes inconsistency is unavoidable, whether by design, or simply because of a buggy implementation. Background processes can identify these inconsistencies and handle them in various ways.

Completer

Complete unfinished operations, even if clients give up retrying

Garbage collection

Find and delete unused data

Reconciliation

Detect and resolve inconsistencies

Other

Other patterns for handling failure or edge cases.

Handling out of order messages

Reliably process dependent messages in any order

Antipatterns

Some patterns exist which should be avoided. They may seem to offer benefits, but either do not deliver what they seem to or have other serious drawbacks.

I/O inside transaction

Wrap a transaction around non-database I/O

Reject duplicate requests

Return an error when a duplicate request is detected

Comparisons

When consistency is important, you will generally need to choose (at least) one of the patterns in the table below.

Pattern Number of systems Synchronicity Atomicity Consistency Complexity
ACID transaction One Sync Atomic Strong Simple
Transactional outbox Many Async Non-atomic1 Eventual Moderate
Reliable retries Many Async Non-atomic1 Eventual Moderate
Completer Many Success: sync
Error: async2
Non-atomic1 Eventual Moderate
Distributed transaction Many Success: sync
Error: async2
Atomic Eventual Complex
Saga Many Async Atomic Eventual Complex

More patterns

  1. Non-atomic because some writes might fail with no guarantee that successful writes will be rolled back.  2 3

  2. Attempts to do all work synchronously, but will continue asynchronously in the case of failure.  2