Thom's Blog


Failure pattern – Complete unfinished operations, even if clients give up retrying


Operations can fail part way through. Clients can retry, but might give up before driving the operation to completion.


The operation is resumable and recovery points are used.

It is acceptable for the operation to happen asynchronously.


Making a payment on an e-commerce system. At a high level, the operation might look like this:

  1. Save order details.
  2. Take payment.
  3. Start fulfilment process.

Taking a payment but never starting the fulfilment process would result in some unhappy customers.


How do we ensure that important multi-step operations are always completed?


Run a background completer process. It should:

  1. Find recovery points which are incomplete, and have not been updated recently.
  2. Resume the operation. Either by running the operation itself, or by requesting the application process to do it.

Also known as

See also