Context
Some operations need to write data to an external system, and write a reference to that data in another system (often a local database). Referencing data that does not exist is likely to be an invalid system state.
Prerequisites
It is acceptable to end up with “garbage” un-referenced data. The data should not be accessed without a reference.
Examples
Uploading a profile image on a social network.
Problem
How do we prevent dangling references if writing the data fails?
Solution
First store the data, then store the reference. Any updates to this data should be written separately, rather than overwriting the original, in an append-only manner.
This is similar to Multiversion Concurrency Control (MVCC) in databases, where instead of updating a row in place, a new version is written along with the associated transaction ID. This new version will not be read until that transaction ID is marked as committed.
This operation is naturally resumable. Garbage collection can be used to clean up stale, unreferenced data.
See also
Related
- Garbage collection – Find and delete unused data
- Recovery point – Record current progress to allow recovery with minimal rework
- Resumable operation – Allow operations to continue from where the previous attempt failed