Let’s also assume that our hash function maps each URL to a server which will be responsible for crawling it.
With above statement, does it mean there will be a separate dispatcher component which will dispatch the url to be crawled by using the hash function?? What will that hash function looks like?
Once the url (for eg: URL A) is dispatched to a particular server (for eg ServerA) then to particluar thread (for eg; Thread A) which is handeling crawl for a particular Host (for eg: HostA).
What will happen if the link inside the document of URL A has a url to another Host (for Eg; Host B)?
=> In which queue will Thread A puts that URL into