Thom's Blog

Garbage collection

Failure pattern – Find and delete unused data


Some of these patterns can produce a lot of garbage data which will never be used.


Idempotency keys can expire, recovery points might be redundant after operations complete, and unreferenced data can build up with enough failures.


How do we limit the amount of garbage data stored?


Run a periodic garbage collection process. The process can be scheduled to run as a cron job. It will need to:

  1. Identify which records are no longer needed.
  2. Delete these records.

It can be worth considering performing this process in fixed-size batches so the operation doesn’t overload the database, and running frequently enough that the amount of garbage doesn’t grow faster than it can be collected.


In some cases it is possible to put the data in a cache with a TTL. If the data is intended to be temporary then this can work well.