如何限制并发运行映射任务? [英] how to restrict the concurrent running map tasks?
问题描述
a)mapred.job.map.capacity
但是在我的hadoop版本中,这个参数似乎被放弃了。
mapred.jobtracker.taskScheduler.maxRunningTasksPerJob(http://grepcode.com/file/repo1.maven.org /maven2/com.ning/metrics.collector/1.0.2/mapred-default.xml)
我设置了这个变量,如下所示:
Configuration conf = new Configuration();
conf.set(date,date);
conf.set(mapred.job.queue.name,hadoop);
conf.set(mapred.jobtracker.taskScheduler.maxRunningTasksPerJob,10);
DistributedCache.createSymlink(conf);
工作职位=新职位(conf,ConstructApkDownload_+日期);
...
问题在于它不起作用。在作业开始时,仍有超过50张地图正在运行。
查看hadoop文档后,我找不到另一个地图来限制并发运行的地图任务。
希望有人能帮助我,谢谢。
===================== p>
我找到了关于这个问题的答案,在这里与其他可能感兴趣的人分享。使用公平调度程序,在配置文件(fair-scheduler.xml)中使用配置参数maxMaps来设置池的最大并发任务插槽。
然后,当您提交作业时,只需将作业的队列设置到相应的池中即可。
mapred.jobtracker.maxtasks.per.job
为-1以外的值(默认值)。这限制了作业可以同时使用的map或reduce任务的数量。
这个变量描述如下:
单个作业的最大任务数。值为-1表示没有最大值。
我认为有计划添加 mapred。 max.maps.per.node
和 mapred.max.reduces.per.node
转换为作业配置,但是它们从未将它释放。 / p>
My hadoop version is 1.0.2. Now I want at most 10 map tasks running at the same time. I have found 2 variable related to this question.
a) mapred.job.map.capacity
but in my hadoop version, this parameter seems abandoned.
b) mapred.jobtracker.taskScheduler.maxRunningTasksPerJob (http://grepcode.com/file/repo1.maven.org/maven2/com.ning/metrics.collector/1.0.2/mapred-default.xml)
I set this variable like below:
Configuration conf = new Configuration();
conf.set("date", date);
conf.set("mapred.job.queue.name", "hadoop");
conf.set("mapred.jobtracker.taskScheduler.maxRunningTasksPerJob", "10");
DistributedCache.createSymlink(conf);
Job job = new Job(conf, "ConstructApkDownload_" + date);
...
The problem is that it doesn't work. There is still more than 50 maps running as the job starts.
After looking through the hadoop document, I can't find another to limit the concurrent running map tasks. Hope someone can help me ,Thanks.
=====================
I hava found the answer about this question, here share to others who may be interested.
Using the fair scheduler, with configuration parameter maxMaps to set the a pool's maximum concurrent task slots, in the Allocation File (fair-scheduler.xml). Then when you submit jobs, just set the job's queue to the according pool.
You can set the value of mapred.jobtracker.maxtasks.per.job
to something other than -1 (the default). This limits the number of simultaneous map or reduce tasks a job can employ.
This variable is described as:
The maximum number of tasks for a single job. A value of -1 indicates that there is no maximum.
I think there were plans to add mapred.max.maps.per.node
and mapred.max.reduces.per.node
to job configs, but they never made it to release.
这篇关于如何限制并发运行映射任务?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!