SetNumMapTask和mapreduce.Job [英] SetNumMapTask with a mapreduce.Job
问题描述
如何使用org.apache.hadoop.mapreduce.Job设置map任务的数量?该函数似乎并不存在...但它存在于org.apacache.hadoop.mapred.JobConf ...
谢谢!
AFAIK, setNumMapTasks 不再受支持。
这仅仅是框架的一个暗示(即使在旧的API中),也不保证你只能得到指定数量的地图。创建地图实际上是由您在工作中使用的 InputFormat 管理的。
您可以根据需要调整以下属性: / p>
-
mapred.min.split.size
mapred.max.split.size
处理小数据,将 mapred.max.split.size 设置为更低的值应该是个诀窍。你可以在你的工作中使用 setMaxInputSplitSize(Job,long)来改变它。 long参数是以字节为单位的分割大小,您可以将其设置为期望的值。
另外,设置HDFS块大小使用 dfs.block.size 为小数据创建更小的值。How can I set the number of map task with a org.apache.hadoop.mapreduce.Job ? The function does not seem to exist... But it exists for the org.apacache.hadoop.mapred.JobConf...
Thanks !
AFAIK, setNumMapTasks is not supported any more.
It is merely a hint to the framework(even in the old API), and doesn't guarantee that you'll get only the specified number of maps. The map creation is actually governed by the InputFormat you are using in your job.
You could tweak the following properties as per your needs :
mapred.min.split.size
mapred.max.split.size
Since you are dealing with small data, setting mapred.max.split.size to a lower value should do the trick. You could use setMaxInputSplitSize(Job, long) inside your job to alter this. The long argument is the size of the split in bytes, which you can set to your desired value.
Also, set the HDFS block size to a smaller value for small data using dfs.block.size.
这篇关于SetNumMapTask和mapreduce.Job的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!