如何在多核8节点集群中安排Hadoop Map任务? [英] How to schedule Hadoop Map tasks in multi-core 8 node cluster?
问题描述
我有一个仅限地图(无缩减阶段)的程序。输入文件的大小足以创建7个地图任务,并且通过查看生成的输出(部分000到部分006)来验证。现在,我的集群有8个节点,每个节点有8个内核,8个内存和共享文件系统托管在头节点。
我可以选择运行全部7仅在1个节点中映射任务或在7个不同的从节点中运行7个映射任务(每个节点有1个任务)。如果我可以这样做,那么需要在我的代码和配置文件中进行哪些更改。
我尝试将参数mapred.tasktracker.map.tasks.maximum设置为1和7在我的代码只,但我没有发现任何可观的时差。在我的配置文件中,它被设置为1。 mapred.tasktracker.map.tasks.maximum
,它只会改变每个节点要执行的地图任务的数量。
mapred.tasktracker.map.tasks.maximum
的范围来自 1/2 * cores / node
到 2 * cores / node
您需要设置的地图任务的数量应该使用 setNumMapTasks(int)
I have a "map only" (no reduce phase) program. The size of input file is large enough to create 7 map tasks and I have verified that by looking the output produced (part-000 to part006) . Now, my cluster has 8 nodes each with 8 cores and 8 GB of memory and shared filesystem hosted at head node.
My question is can I choose between running all the 7 map tasks in 1 node only or running the 7 map tasks in 7 different slave nodes (1 task per node). If I can do so, then what change in my code and configuration file is needed.
I tried setting the parameter "mapred.tasktracker.map.tasks.maximum" to 1 and 7 in my code only but I didnot find any appreciable time difference. In my configuration file it is set as 1.
"mapred.tasktracker.map.tasks.maximum"
deals with the number of map tasks that should be launched on each node, not the number of nodes to be used for each map task. In the Hadoop architecture, there is 1 tasktracker for each node (slaves) and 1 job tracker on a master node (master). So if you set the property mapred.tasktracker.map.tasks.maximum
, it will only change the number of map tasks to be executed per node.
The range of "mapred.tasktracker.map.tasks.maximum"
is from 1/2*cores/node
to 2*cores/node
The number of map tasks that you want overall should be set using setNumMapTasks(int)
这篇关于如何在多核8节点集群中安排Hadoop Map任务?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!