设置地图任务的数量并减少任务 [英] Setting the number of map tasks and reduce tasks

查看:133
本文介绍了设置地图任务的数量并减少任务的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在执行一项工作,我将地图任务的数量固定为20,但获得更高的数字。我也将减少任务设置为零,但我仍然得到一个非零的数字。 MapReduce作业完成的总时间也不显示。有人能告诉我我做错了什么吗?
我正在使用这个命令

pre $ hadoop jar Test_Parallel_for.jar Test_Parallel_for Matrix / test4.txt结果3 \ -D mapred.map.tasks = 20 \ -D mapred.reduce.tasks = 0

输出:

  11/07/30 19:48:56信息mapred.JobClient:工作完成:job_201107291018_0164 
11/07 / 30 19:48:56信息mapred.JobClient:计数器:18
11/07/30 19:48:56信息mapred.JobClient:作业计数器
11/07/30 19:48:56信息mapred.JobClient:启动reduce任务= 13
11/07/30 19:48:56信息mapred.JobClient:机架本地地图任务= 12
11/07/30 19:48:56信息mapred.JobClient:启动地图任务= 24
11/07/30 19:48:56信息mapred.JobClient:数据本地地图任务= 12
11/07/30 19:48:56信息mapred.JobClient:FileSystemCounters
11/07/30 19:48:56信息mapred.JobClient:FILE_BYTES_READ = 4020792636
11/07/30 19:48:56信息mapred.JobClient:HDFS_BYTES_READ = 1556534680
11/07/30 19:48 :56信息mapred.JobClient:FILE_BYTES_WRITTEN = 6026699058
11/07/30 19:48:56信息mapred.JobClient:HDFS_BYTES_WRITTEN = 1928893942
11/07/30 19:48:56信息mapred.JobClient :Map-Reduce Framework
11/07/30 19:48:56信息mapred.JobClient:减少输入组= 40000000
11/07/30 19:48:56信息mapred.JobClient:合并输出记录= 0
11/07/30 19:48:56信息mapred.JobClient:映射输入记录= 40000000
11/07/30 19:48:56信息mapred.JobClient:减少随机播放字节= 1974162269
11/07/30 19:48:56信息mapred.JobClient:减少输出记录= 40000000
11/07/30 19:48:56信息mapred.JobClient:溢出记录= 120000000
11/07/30 19:48:56信息mapred.JobClient:地图输出字节= 1928893942
11/07/30 19:48:56信息mapred.JobClient:结合输入记录= 0
11/07/30 19:48:56信息mapred.JobClient:地图输出记录= 40000000
11/07/30 19:48:56信息mapred.JobClient:减少输入记录= 40000000
[hcrc 1425n30] s0907855:


解决方案

给定的作业是由输入分割的数量驱动的,而不是由mapred.map.tasks参数驱动的。对于每个输入分割,都会生成一个映射任务。因此,在mapreduce作业的整个生命周期中,map任务的数量等于输入分割的数量。 mapred.map.tasks只是InputFormat对地图数量的提示。



在您的示例中,Hadoop已经确定有24个输入分割,并会产生24个地图任务总共。但是,您可以控制每个任务跟踪器可以并行执行多少个映射任务。

另外,删除-D之后的空格可能会解决减少的问题。

有关地图和减少任务数量的更多信息,请查看以下网址:

< a href =http://wiki.apache.org/hadoop/HowManyMapsAndReduces =noreferrer> http://wiki.apache.org/hadoop/HowManyMapsAndReduces


I am currently running a job I fixed the number of map task to 20 but and getting a higher number. I also set the reduce task to zero but I am still getting a number other than zero. The total time for the MapReduce job to complete is also not display. Can someone tell me what I am doing wrong. I am using this command

hadoop jar Test_Parallel_for.jar Test_Parallel_for Matrix/test4.txt Result 3 \ -D mapred.map.tasks = 20 \ -D mapred.reduce.tasks =0

Output:

11/07/30 19:48:56 INFO mapred.JobClient: Job complete: job_201107291018_0164
11/07/30 19:48:56 INFO mapred.JobClient: Counters: 18
11/07/30 19:48:56 INFO mapred.JobClient:   Job Counters 
11/07/30 19:48:56 INFO mapred.JobClient:     Launched reduce tasks=13
11/07/30 19:48:56 INFO mapred.JobClient:     Rack-local map tasks=12
11/07/30 19:48:56 INFO mapred.JobClient:     Launched map tasks=24
11/07/30 19:48:56 INFO mapred.JobClient:     Data-local map tasks=12
11/07/30 19:48:56 INFO mapred.JobClient:   FileSystemCounters
11/07/30 19:48:56 INFO mapred.JobClient:     FILE_BYTES_READ=4020792636
11/07/30 19:48:56 INFO mapred.JobClient:     HDFS_BYTES_READ=1556534680
11/07/30 19:48:56 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=6026699058
11/07/30 19:48:56 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1928893942
11/07/30 19:48:56 INFO mapred.JobClient:   Map-Reduce Framework
11/07/30 19:48:56 INFO mapred.JobClient:     Reduce input groups=40000000
11/07/30 19:48:56 INFO mapred.JobClient:     Combine output records=0
11/07/30 19:48:56 INFO mapred.JobClient:     Map input records=40000000
11/07/30 19:48:56 INFO mapred.JobClient:     Reduce shuffle bytes=1974162269
11/07/30 19:48:56 INFO mapred.JobClient:     Reduce output records=40000000
11/07/30 19:48:56 INFO mapred.JobClient:     Spilled Records=120000000
11/07/30 19:48:56 INFO mapred.JobClient:     Map output bytes=1928893942
11/07/30 19:48:56 INFO mapred.JobClient:     Combine input records=0
11/07/30 19:48:56 INFO mapred.JobClient:     Map output records=40000000
11/07/30 19:48:56 INFO mapred.JobClient:     Reduce input records=40000000
[hcrc1425n30]s0907855: 

解决方案

The number of map tasks for a given job is driven by the number of input splits and not by the mapred.map.tasks parameter. For each input split a map task is spawned. So, over the lifetime of a mapreduce job the number of map tasks is equal to the number of input splits. mapred.map.tasks is just a hint to the InputFormat for the number of maps.

In your example Hadoop has determined there are 24 input splits and will spawn 24 map tasks in total. But, you can control how many map tasks can be executed in parallel by each of the task tracker.

Also, removing a space after -D might solve the problem for reduce.

For more information on the number of map and reduce tasks, please look at the below url

http://wiki.apache.org/hadoop/HowManyMapsAndReduces

这篇关于设置地图任务的数量并减少任务的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆