减速器的默认数量 [英] Default number of reducers

查看:100
本文介绍了减速器的默认数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Hadoop中,如果我们没有设置数量的reducer,那么会创建多少个数量的reducer?

像mappers的数量取决于(总数据大小)/(输入拆分大小)
例如如果数据大小为1 TB并且输入分割大小为100 MB。那么mappers的数量将是(1000 * 1000)/ 100 = 10000(万)。

减速机的数量取决于哪些因素?

解决方案

多少减少量 strong>(来自官方文档)



减少的正确数量似乎是0.95或1.75乘以
(节点数量)*(no。每个节点的最大容器数量
)。


0.95所有缩减都可以立即启动并开始传输地图输出,如地图完。以1.75的速度,更快的节点将完成第一轮缩减,并推出第二轮减少更好的负载均衡。

>增加减少数量会增加框架开销,但会增加负载平衡并降低失败成本。



上面的缩放因子略小于预留的整数很少减少投机任务和失败任务框架中的空位。



本文还介绍了Mapper计数。



多少图 b
$ b


地图的数量通常由输入文件的总大小,即输入文件的总块数。

并行度似乎是每个节点10-100个左右的地图,尽管它已经被设置为300个地图用于非常cpu-light地图任务。任务设置需要一段时间,所以最好是如果地图至少需要一分钟才能执行。



因此,如果您期望10TB的输入数据并且拥有一个128MB的块大小,你将最终得到 82,000个地图,除非 Configuration.set(MRJobConfig.NUM_MAPS,int)(它只是提供框架提示)用于将它设置得更高。



如果您想要更改减速器数量的默认值1,您可以设置下面的属性(来自hadoop 2.x版本)作为命令行参数

mapreduce.job.reduces

OR



您可以使用

  job.setNumReduceTasks(integer_numer); 

查看另一个相关的SE问题:什么是Hadoop上理想的数量减少者?


In Hadoop, if we have not set number of reducers, then how many number of reducers will be created?

Like number of mappers is dependent on (total data size)/(input split size), E.g. if data size is 1 TB and input split size is 100 MB. Then number of mappers will be (1000*1000)/100 = 10000(Ten thousand).

The number of reducer is dependent on which factors ? How many reducers are created for a job?

解决方案

How Many Reduces? ( From official documentation)

The right number of reduces seems to be 0.95 or 1.75 multiplied by (no. of nodes) * (no. of maximum containers per node).

With 0.95 all of the reduces can launch immediately and start transferring map outputs as the maps finish. With 1.75 the faster nodes will finish their first round of reduces and launch a second wave of reduces doing a much better job of load balancing.

Increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the cost of failures.

The scaling factors above are slightly less than whole numbers to reserve a few reduce slots in the framework for speculative-tasks and failed tasks.

This article covers about Mapper count too.

How Many Maps?

The number of maps is usually driven by the total size of the inputs, that is, the total number of blocks of the input files.

The right level of parallelism for maps seems to be around 10-100 maps per-node, although it has been set up to 300 maps for very cpu-light map tasks. Task setup takes a while, so it is best if the maps take at least a minute to execute.

Thus, if you expect 10TB of input data and have a blocksize of 128MB, you’ll end up with 82,000 maps, unless Configuration.set(MRJobConfig.NUM_MAPS, int) (which only provides a hint to the framework) is used to set it even higher.

If you want to change the default value of 1 for number of reducers, you can set below property (From hadoop 2.x version) as a command line parameter

mapreduce.job.reduces

OR

you can set programmatically with

job.setNumReduceTasks(integer_numer);

Have a look at one more related SE question: What is Ideal number of reducers on Hadoop?

这篇关于减速器的默认数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆