educative.io

Aborted crawls can easily be restarted from the latest checkpoint

8. Checkpointing: A crawl of the entire Web takes weeks to complete. To guard against failures, our crawler can write regular snapshots of its state to the disk. An interrupted or aborted crawl can easily be restarted from the latest checkpoint.

what is the checkpointing doing? is it making sure that if a server fails, all the in progress worker threads will still be restarted?