如何限制并发运行映射任务? [英] how to restrict the concurrent running map tasks?

查看:205
本文介绍了如何限制并发运行映射任务?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的hadoop版本是1.0.2。现在我最多需要同时运行10个地图任务。我找到了2个与此问题相关的变量。



a)mapred.job.map.capacity



但是在我的hadoop版本中,这个参数似乎被放弃了。

mapred.jobtracker.taskScheduler.maxRunningTasksPerJob(http://grepcode.com/file/repo1.maven.org /maven2/com.ning/metrics.collector/1.0.2/mapred-default.xml)

我设置了这个变量,如下所示:

  Configuration conf = new Configuration(); 
conf.set(date,date);
conf.set(mapred.job.queue.name,hadoop);
conf.set(mapred.jobtracker.taskScheduler.maxRunningTasksPerJob,10);

DistributedCache.createSymlink(conf);
工作职位=新职位(conf,ConstructApkDownload_+日期);
...

问题在于它不起作用。在作业开始时,仍有超过50张地图正在运行。

查看hadoop文档后,我找不到另一个地图来限制并发运行的地图任务。
希望有人能帮助我,谢谢。



===================== p>

我找到了关于这个问题的答案,在这里与其他可能感兴趣的人分享。使用公平调度程序,在配置文件(fair-scheduler.xml)中使用配置参数maxMaps来设置池的最大并发任务插槽。
然后,当您提交作业时,只需将作业的队列设置到相应的池中即可。

解决方案

mapred.jobtracker.maxtasks.per.job 为-1以外的值(默认值)。这限制了作业可以同时使用的map或reduce任务的数量。

这个变量描述如下:


单个作业的最大任务数。值为-1表示没有最大值。


我认为有计划添加 mapred。 max.maps.per.node mapred.max.reduces.per.node 转换为作业配置,但是它们从未将它释放。 / p>

My hadoop version is 1.0.2. Now I want at most 10 map tasks running at the same time. I have found 2 variable related to this question.

a) mapred.job.map.capacity

but in my hadoop version, this parameter seems abandoned.

b) mapred.jobtracker.taskScheduler.maxRunningTasksPerJob (http://grepcode.com/file/repo1.maven.org/maven2/com.ning/metrics.collector/1.0.2/mapred-default.xml)

I set this variable like below:

Configuration conf = new Configuration();
conf.set("date", date);
conf.set("mapred.job.queue.name", "hadoop");
conf.set("mapred.jobtracker.taskScheduler.maxRunningTasksPerJob", "10");

DistributedCache.createSymlink(conf);
Job job = new Job(conf, "ConstructApkDownload_" + date);
...

The problem is that it doesn't work. There is still more than 50 maps running as the job starts.

After looking through the hadoop document, I can't find another to limit the concurrent running map tasks. Hope someone can help me ,Thanks.

=====================

I hava found the answer about this question, here share to others who may be interested.

Using the fair scheduler, with configuration parameter maxMaps to set the a pool's maximum concurrent task slots, in the Allocation File (fair-scheduler.xml). Then when you submit jobs, just set the job's queue to the according pool.

解决方案

You can set the value of mapred.jobtracker.maxtasks.per.job to something other than -1 (the default). This limits the number of simultaneous map or reduce tasks a job can employ.

This variable is described as:

The maximum number of tasks for a single job. A value of -1 indicates that there is no maximum.

I think there were plans to add mapred.max.maps.per.node and mapred.max.reduces.per.node to job configs, but they never made it to release.

这篇关于如何限制并发运行映射任务?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆