educative.io

Educative

Is URL Frontier datastore or app server?

I have a confusion related to URL frontier.

Firstly, they are represented differently in both diagrams. On the first diagram, it has been shown as datastore and on the second diagram it has been shown as component (server)

Secondly, it is written that

“we can distribute our URL frontier into multiple servers. Let’s assume on each server we have multiple worker threads performing the crawling tasks”

So are we running threads on the URL frontier? Shouldn’t it be a separate component? In addition, if it is a DB to store URLs then why we are running crawling tasks on them?

So I am not clear about what is URL frontier and its responsibility.

Thanks

Hi Emre Caglar,

You are getting confused between the high-level design and the detailed component design. We intentionally kept things simple while discussing the high level design and envisioned URL Frontier as a data store. Eventually the URL Frontier will be app servers responsible for crawling and also keeping the URLs to be crawled in FIFO queues. These queues will be store (backed-up) in some file storage too.

Hope this answered your question.

2 Likes