为什么在Hadoop 0.95或1.75中减少数量正确? [英] Why is the right number of reduces in Hadoop 0.95 or 1.75?

查看:74
本文介绍了为什么在Hadoop 0.95或1.75中减少数量正确?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

hadoop文档指出:

The hadoop documentation states:

减少的正确数量似乎是0.95或1.75乘以 (* mapred.tasktracker.reduce.tasks.maximum).

The right number of reduces seems to be 0.95 or 1.75 multiplied by ( * mapred.tasktracker.reduce.tasks.maximum).

使用0.95时,所有的减少都可以立即启动并开始 在地图完成时传输地图输出.使用1.75更快 节点将完成其第一轮reduce,然后启动第二轮 减少浪潮可以更好地完成负载平衡.

With 0.95 all of the reduces can launch immediately and start transferring map outputs as the maps finish. With 1.75 the faster nodes will finish their first round of reduces and launch a second wave of reduces doing a much better job of load balancing.

这些值是否恒定不变?当您在这些数字之间或之外选择一个值时,结果如何?

Are these values pretty constant? What are the results when you chose a value between these numbers, or outside of them?

推荐答案

值应该是您的情况所需的值. :)

The values should be what your situation needs them to be. :)

以下是我对这些值的好处的理解:

The below is my understanding of the benefit of the values:

.95是为了最大程度地利用可用的减速器.如果Hadoop默认使用单个化简器,则将不会分发化简表,从而导致其花费的时间比应有的长.减速器的增加和时间的减少几乎呈线性拟合(在我的有限情况下).如果在一个减速器上花费16分钟,那么在8个减速器上花费2分钟.

The .95 is to allow maximum utilization of the available reducers. If Hadoop defaults to a single reducer, there will be no distribution of the reducing, causing it to take longer than it should. There is a near linear fit (in my limited cases) to the increase in reducers and the reduction in time. If it takes 16 minutes on 1 reducer, it takes 2 minutes on 8 reducers.

1.75是一个试图优化节点中机器性能差异的值.它将创建多于一次的减速器,这样,较快的机器将承担额外的减速器,而较慢的机器则不承担.
这个数字(1.75)是一个需要调整的值,而不是.95值.如果您有1台快速机器而3台速度较慢的机器,则可能只需要1.10.这个数字将需要更多的实验来找到适合您的硬件配置的值.如果减速器的数量过多,则慢速机器将再次成为瓶颈.

The 1.75 is a value that attempts to optimize the performance differences o the machines in a node. It will create more than a single pass of reducers so that the faster machines will take on additional reducers while slower machines do not.
This figure (1.75) is one that will need to be adjusted much more to your hardware than the .95 value. If you have 1 quick machine and 3 slower, maybe you'll only want 1.10. This number will need more experimentation to find the value that fits your hardware configuration. If the number of reducers is too high, the slow machines will be the bottleneck again.

这篇关于为什么在Hadoop 0.95或1.75中减少数量正确?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆