educative.io

How the size of checksum is determined for URLs and Dowloaded pages for dedupe?

Below lines calculate memory required to store checksum for web page and URL for dedupe, respectively.

15B * 8 bytes => 120 GB
15B * 4 bytes => 60 GB

Can you please help us to understand how size 8 bytes and 4 bytes were calculated?

There are different check sum size for a webpage and a URL. Website mentions a 4 byte of size in case of URL. This can be arbitrary size or a fix size for single URL. It is not calculated. It is just taken as an example. Hope this helps.