HDFS - Rack Aware Replication

Nishant_Sharma · September 4, 2021, 5:47pm

List item

Why HDFS places the first replica on the same node?
If the node fails then we can not have access to this same node replica.
In what scenario will it be beneficial?

Design_Gurus · September 6, 2021, 9:23pm

It is helpful when two processes are running on the same node, one is producing data and the other is consuming it. The consumer process does not need to fetch data from the other node. For MapReduce jobs, this becomes quite uesfull.