Designing for failure: Patterns
Reusable building blocks to help design reliable systems in the presence of failures.
See the introductory post.
API design
Rather than internal details, these patterns describe the API as seen by clients.
- Idempotency key
-
Identify identical requests
- Reject non-identical retries
-
Detect changes in request content between retries
- Callback
-
Inform clients about the results of asynchronous operations
Writing to a single system
Patterns for writing to a single system. Most patterns assume this system is an ACID database. This is the simplest topology, and the easiest to work with. It’s worth trying to design systems like this where possible, to avoid the complexity that arises from trying to maintain consistency between multiple systems.
- ACID transaction
-
Perform multiple writes, such that either all of them or none of them succeed
- Atomic read-then-write
-
Concurrently write data based on current state
- Idempotency key (external)
-
Send a request to an external system at-least-once with only a single side effect
- Change record
-
Record that a change has been made so it doesn’t happen again
- Response record
-
Return the same response for every retry
Writing to multiple systems
When writing to a single ACID database, we get atomicity and consistency built in. Things get more complicated when writing to multiple systems where we don’t have these guarantees: we might not be able to perform all writes atomically, and so can end up in an inconsistent state.
- Transactional outbox
-
Transactionally write a description of work to be performed asynchronously
- Saga
-
Perform a series of transactions with backwards recovery
- Distributed transaction
-
Write to multiple systems transactionally
- Resumable operation
-
Allow operations to continue from where the previous attempt failed
- Recovery point
-
Record current progress to allow recovery with minimal rework
- Reliable retries
-
Reliably keep retrying until success
- At-most-once guard
-
Write to a system at most once
- Idempotency key lock
-
Protect against concurrent retries
- Store-then-reference
-
Prevent dangling references
Background processes
Sometimes inconsistency is unavoidable, whether by design, or simply because of a buggy implementation. Background processes can identify these inconsistencies and handle them in various ways.
- Completer
-
Complete unfinished operations, even if clients give up retrying
- Garbage collection
-
Find and delete unused data
- Reconciliation
-
Detect and resolve inconsistencies
Other
Other patterns for handling failure or edge cases.
- Handling out of order messages
-
Reliably process dependent messages in any order
Antipatterns
Some patterns exist which should be avoided. They may seem to offer benefits, but either do not deliver what they seem to or have other serious drawbacks.
- I/O inside transaction
-
Wrap a transaction around non-database I/O
- Reject duplicate requests
-
Return an error when a duplicate request is detected
Comparisons
When consistency is important, you will generally need to choose (at least) one of the patterns in the table below.
Pattern | Number of systems | Synchronicity | Atomicity | Consistency | Complexity |
---|---|---|---|---|---|
ACID transaction | One | Sync | Atomic | Strong | Simple |
Transactional outbox | Many | Async | Non-atomic1 | Eventual | Moderate |
Reliable retries | Many | Async | Non-atomic1 | Eventual | Moderate |
Completer | Many | Success: sync Error: async2 |
Non-atomic1 | Eventual | Moderate |
Distributed transaction | Many | Success: sync Error: async2 |
Atomic | Eventual | Complex |
Saga | Many | Async | Atomic | Eventual | Complex |