educative.io

No need to wait for hash calculation to complete

With post-process deduplication, new chunks are first stored on the storage device and later some process analyzes the data looking for duplication. The benefit is that clients will not need to wait for the hash calculation or lookup to complete before storing the data, thereby ensuring that there is no degradation in storage performance.

why is hash calculation such a bottleneck? shouldn’t we be able to do it quickly. if so, is there any benefit of post-process dedup?