如何分辨MapReduce同时使用多少个mapper? [英] How to tell MapReduce how many mappers to use at the same time?

查看:242
本文介绍了如何分辨MapReduce同时使用多少个mapper?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为MapReduce编写一个索引应用程序。
我能够将输入与NLineInputFormat分离,现在我的应用程序中有几百个映射器。然而,只有2 / mashine在同一时间是活跃的,其余的是PENDING。我相信这样的行为会显着降低应用程序的速度。



我如何让hadoop在每台机器上同时运行至少100个?



我是使用旧的hadoop api语法。这是我迄今为止所尝试的:

  conf.setNumMapTasks(1000); 
conf.setNumTasksToExecutePerJvm(500);

这些都没有任何效果。



/hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/JobConf.html#setNumMapTasks%28int%29rel =nofollow> JobConf.setNumMapTasks()只是给MR框架一个提示,我不确定调用它的效果。在您的情况下,整个作业中的地图任务总数应该等于输入中的总行数除以NLineInputFormat中配置的行数。您可以在整个作业此处找到有关地图/减少任务总数的更多详细信息。



有关的说明mapred.tasktracker.map.tasks.maximum 表示


任务将同时运行的地图任务的最大数量

您需要配置mapred.tasktracker.map.tasks.maximum(默认为2)以更改数字由任务跟踪器在特定节点上并行运行地图任务。我无法获得0.20.2的文档,因此我不确定该参数是否存在,或者是否在0.20.2版本中使用了相同的参数名称。


I am writing an indexing app for MapReduce. I was able to split inputs with NLineInputFormat, and now I've got few hundred mappers in my app. However, only 2/mashine of those are active at the same time, the rest are "PENDING". I believe that such a behavior slows the app significantly.

How do I make hadoop run at least 100 of those at the same time per machine?

I am using the old hadoop api syntax. Here's what I've tried so far:

    conf.setNumMapTasks(1000);
    conf.setNumTasksToExecutePerJvm(500);

none of those seem to have any effect.

Any ideas how I can make the mappers actually RUN in parallel?

解决方案

The JobConf.setNumMapTasks() is just a hint to the MR framework and I am not sure the effect of calling it. In your case the total number of map tasks across the whole job should be equal to the total number of lines in the input divided by the number of lines configured in the NLineInputFormat. You can find more details on the total number of map/reduce tasks across the whole job here.

The description for mapred.tasktracker.map.tasks.maximum says

The maximum number of map tasks that will be run simultaneously by a task tracker.

You need to configure the mapred.tasktracker.map.tasks.maximum (which is defaulted to 2) to change the number of map tasks run parallely on a particular node by the task tracker. I could not get the documentation for 0.20.2, so I am not sure if the parameter exists or if the same parameter name is used in 0.20.2 release.

这篇关于如何分辨MapReduce同时使用多少个mapper?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆