Thom's Blog

[WIP] Handling out of order messages

Failure pattern – Reliably process dependent messages in any order

Context

Message are usually not guaranteed to arrive in order. However, sometimes messages have logical ordering. When there is a hard dependency between messages, it might not be possible to process later messages (in the logical ordering) without information from earlier messages.

In many cases the messages arrive in the expected order almost all of the time, so if a system is unable to handle out of order messages it might go unnoticed for a long time.

Example

A payment goes through the following lifecycle:

created -> succeeded
       `-> failed

With the following associated events: PaymentCreated, PaymentSucceeded and PaymentFailed.

It might be expected that PaymentCreated will always be received first, but this is not guaranteed. The PaymentCreated event might also contain vital information not included in the later events, so e.g. PaymentSucceeded cannot be processed independently.

Problem

How do we gracefully handle out of order messages, even when there are hard dependencies between them?

Solution

If possible, design the messages such that they can be processed in any order. For example, the PaymentSucceeded event above could include all the information from PaymentCreated so it can be processed independently.

If that is not possible, then consider a design such as the following.

Given two events with a natural ordering A -> B, where we are unable to process B until A has arrived:

Handler for A

Ensure that if B arrives first, we process it as soon as A arrives.

  1. Insert event A into an ordered log
  2. Check for presence of associated event B in the log, prior to A
    • If exists, process A and then B
    • Else, process A as usual
Handler for B

Ensure that if B arrives first, it is stored for later.

  1. Insert event B into an ordered log
  2. Check for presence of associated event A in the log, prior to B
    • If exists, process B
    • Else, no-op

This solution relies on an ordered event log, where it is guaranteed that if an event with ordinal N exists in the log, all events with ordinal < N exist in the log and are visible.

An event log might not be necessary, but is a nice general-purpose solution to this problem.

Alternatives

Delay and replay

It can be tempted to simply delay the processing of out of order messages, in the hope that the required preceding messages will arrive soon.

This has the drawback of creating unnecessary delays. If we delay message B for one minute, but the required message A arrives after one second, then we will process B 59 seconds later than necessary. Ideally, we want to process messages as soon as possible.

Also, this approach is in some sense a “hack”. The system is not able to properly handle the out of order messages, so we delay them until they are in order. Arguably a better approach is to design the system such that it can handle messages in any order.