educative.io

Each host will perform checkpointing periodically and dump a snapshot of all the data it is holding onto a remote server

Each host will perform checkpointing periodically and dump a snapshot of all the data it is holding onto a remote server. This will ensure that if a server dies down another server can replace it by taking its data from the last snapshot.

what is being snapshot? is it mostly the state of the queues? won’t a durable queue solve the problem? one issue could be items that have been dequeued but are in the middle of processing. but can’t this be solved with a dead letter queue?

This system design course is a particularly abstract course, so depending on the implementation this may differ. As far as I can tell I agree it would be a snapshot of the queues. The one issue I can see with your durable queue implementation is: What if the server does not come back online or takes a while? A snapshot(copy in RAM) on a remote server can quickly take its place instantly.