什么是Hadoop上理想的数量减少者? [英] What is Ideal number of reducers on Hadoop?

查看:108
本文介绍了什么是Hadoop上理想的数量减少者?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

按照Hadoop wiki给出的计算理想减缩器的数量是0.95或1.75 *(nodes * mapred.tasktracker.tasks.maximum)


$ b 但是当选择0.95和1.75时?什么是决定这个乘数时考虑的因素

解决方案

假设您有100个可用的减少位集群。

加载因子为0.95时,所有95个reduce任务将同时启动,因为所有任务都有足够的减少空位。这意味着任何任务都不会在队列中等待,直到剩下的其中一个完成。当减少任务是小时,我会推荐这个选项,即完成相对较快,或者他们都需要相同的时间,或多或少。

手,负载因子为1.75,100个减少的任务将同时启动,与可用的减少的时间片一样多,并且剩余的75个休息时间将在队列中等待,直到有一个减少的时间片可用。这提供了更好的负载均衡,因为如果某些任务比其他任务更重要,即需要更多时间,那么它们不会成为工作的瓶颈,因为其他任务减少了时隙,而不是完成任务并等待,现在将正在执行队列中的任务。这也减轻了每个减少任务的负担,因为地图输出的数据传播到更多任务。



如果我可以表达我的意见,我不确定是否这些因素总是理想的。通常,我使用大于1.75的因子(有时甚至是4或5),因为我正在处理大数据,并且我的数据不适合每台机器,除非我将此因子设置得更高并且负载平衡也更好。 p>

As given by Hadoop wiki to calculate ideal number of reducers is 0.95 or 1.75 * (nodes * mapred.tasktracker.tasks.maximum)

but when to choose 0.95 and when 1.75? what is factor that considered while deciding this multiplier?

解决方案

Let's say that you have 100 reduce slots available in your cluster.

With a load factor of 0.95 all the 95 reduce tasks will start at the same time, since there are enough reduce slots available for all the tasks. This means that no tasks will be waiting in the queue, until one of the rest finishes. I would recommend this option when the reduce tasks are "small", i.e., finish relatively fast, or they all require the same time, more or less.

On the other hand, with a load factor of 1.75, 100 reduce tasks will start at the same time, as many as the reduce slots available, and the 75 rest will be waiting in the queue, until a reduce slot becomes available. This offers better load balancing, since if some tasks are "heavier" than others, i.e., require more time, then they will not be the bottleneck of the job, since the other reduce slots, instead of finishing their tasks and waiting, will now be executing the tasks in the queue. This also lightens the load of each reduce task, since the data of the map output is spread to more tasks.

If I may express my opinion, I am not sure if these factors are ideal always. Often, I use a factor greater than 1.75 (sometimes even 4 or 5), since I am dealing with Big Data, and my data does not fit in each machine, unless I set this factor higher and load balancing is also better.

这篇关于什么是Hadoop上理想的数量减少者?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆