Do mapreduce programs run on namenode or datanode?

Jay_Gangawane · January 11, 2022, 12:06pm

do mapreduce programs run on namenode or datanode?
Who provides the execution environment for the mapreduce job like java dependency or python dependency?
if we copy mappers and reducers code to hdfs, how does it identify it is not the “big data” but the code that acts on it?

Periklis_Gkolias · January 12, 2022, 8:42pm

Hi Jay,

The datanode does the actual labor. The namenode is acting more like an inventory (knows the state of the cluster) and to take decisions.

Who provides the execution environment for the mapreduce job like java dependency or python dependency?

I am not sure I understand the question

if we copy mappers and reducers code to hdfs, how does it identify it is not the “big data” but the code that acts on it?

The best practice is to separate it, in different folders. I havent tried that edge case myself, I would guess that due to the setup of the command (whch specifies which are the mapper and reducer files, where the input files are, maybe a glob regex), it is smart enough to ignore them.