Thom's Blog

Transactional outbox

Failure pattern – Transactionally write a description of work to be performed asynchronously


Some requests need to write to several independent systems. Write operations on these systems might fail independently, or the application might crash in between writes, leaving the system in an inconsistent state.


Eventual consistency is acceptable. Performing the operation (or part of the operation) asynchronously is acceptable.


A new user registers with a social network. As part of registration, a database record needs to be written with the login details, and an event needs to be published so other systems can respond, e.g. by sending a welcome email or finding contact recommendations. Imagine either of the following scenarios:

  1. Publishing fails - The application writes the new user record to the database. It then tries to publish the event, but fails with a transient error.
  2. The application crashes - The application writes the new user record to the database. It then crashes before publishing the event.

Either of these will result in no welcome email being sent, and no other system being aware of the newly registered user.


How do we ensure (eventual) consistency when writing to multiple systems?


Write one or more messages to a database in an Atomic transaction describing what needs to be done. A background process reads these messages and performs the work on the other systems, retrying until it succeeds.

Essentially, instead of trying to synchronously write to many systems, we instead write to just one system atomically, allowing other writes to happen asynchronously.

A common approach is for the background process to publish the messages to a dedicated messaging system, e.g. RabbitMQ or AWS SNS/SQS. This could be an external process or internal to the application.

Outbox with an external publisher process Outbox with an internal publisher
An external publisher process and an internal publisher.

There are several ways to trigger the publisher:

  1. On a schedule, by running as a cron job or a process polling the database. This introduces some latency.
  2. By explicitly notifying it when a new message is ready. Simple if the publisher runs in the same process as the application itself.
  3. By tailing the transaction log. Arguable more complex.

Can be paired with Callbacks to notify clients of the result (e.g. success/failure) on completion.

See also

  • ACID transaction – Perform multiple writes, such that either all of them or none of them succeed
  • Callback – Inform clients about the results of asynchronous operations
  • Reliable retries – Reliably keep retrying until success