educative.io

Outdated or stale data by cache write strategy

This lesson has a quiz question: “Outdated or stale data entry is a typical issue in which writing policy?” With options A) write-through, B) write-back, and C) write-around, and says the answer is B, write-back, because writes happen asynchronously to the database. I wonder whether C could be the answer.

With B, write-back, the data is first written to the cache and then asynchronously to the database. This means the data quickly reaches the database, and while it’s in transit and the db is outdated, clients access the cached data. I suspect that the newly cached data’s TTL is greater than the time for our async db write, so by the time the latest cache data are invalidated, the db has the latest data too, and clients aren’t frequently seeing stale data.

In contrast, with C, write-around, say there’s a frequently-read piece of data, x. When an update to x arrives, it’s just written to the db, and the cache isn’t updated. The cache data remains out of sync with the authoritative db data all the way until it’s invalidated, when we’ll miss and retrieve the authoritative data from the db. That could be seconds or minutes that we’ve allowed clients to access outdated/stale data.


Course: Grokking Modern System Design Interview for Engineers & Managers - Learn Interactively
Lesson: Background of Distributed Cache - Grokking Modern System Design Interview for Engineers & Managers

Hi Issac,

Your observation is correct; Write-back and Write-around policies can potentially lead to stale data. The specific circumstances will determine which policy is more likely to cause issues with stale data. In many cases, Write-around could indeed be the answer, especially in systems where cache hits are frequent, and the cache is relied upon for serving data.

We’ll update this quiz by making both options correct; thank you for your in-depth analysis and for pointing this out. Happy learning at Educative!

Regards,

2 Likes