educative.io

HDFS - Rack Aware Replication

  • List item

Why HDFS places the first replica on the same node?
If the node fails then we can not have access to this same node replica.
In what scenario will it be beneficial?

It is helpful when two processes are running on the same node, one is producing data and the other is consuming it. The consumer process does not need to fetch data from the other node. For MapReduce jobs, this becomes quite uesfull.