Thom's Blog

At-most-once guard

Failure pattern – Write to a system at most once

Context

Some operations involve calling an external system which produces side effects. Ideally the external system supports idempotency keys, allowing the call to be safely retried. When it does not, we need to decide what to do when the outcome of a call is unknown – for example due to a network timeout or a crash.

Sometimes it’s possible to check by reading from the external system (e.g. a GET request), but a suitable read API might not exist, or the external system might be eventually consistent, making the result unreliable.

Without a reliable way to determine the outcome, retrying the call risks producing the side effect again.

Prerequisites

Producing the side effect zero times is acceptable, but more than once is not. It is not necessary to know whether the operation succeeded.

Example

A company is using a payment system which does not implement idempotency keys. The company should not try to initiate a payment more than once, because it might double charge the customer.

Problem

How do we ensure that a side effect happens at most once?

Solution

Write a record to a database before performing the operation. If the record already exists, do not perform the operation. The operation will either succeed or fail. Subsequent retries will see the guard record and not attempt the operation again.

An atomic read-then-write should be used to write the record. In the following example, the query will only return a result if the row does not already exist.

INSERT INTO guards (idempotency_key)
  VALUES ('some-key')
  ON CONFLICT (idempotency_key) DO NOTHING
  RETURNING idempotency_key;
Sequence diagram for writing a guard record

Writing a guard record

This pattern trades liveness for safety: it guarantees the operation won’t happen more than once, but if the operation fails, it will never be retried.

This can lead to uncertainty. If the guard record exists but the outcome is unknown – for example because of a network timeout – did the operation succeed or fail? If knowing the outcome is important, consider recording it alongside the guard record when available, but be aware that if the failure occurs before the outcome can be recorded, the uncertainty remains.