默认减速器数量 [英] Default number of reducers

查看:22
本文介绍了默认减速器数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Hadoop 中,如果我们没有设置 reducer 的数量,那么会创建多少个 reducer?

In Hadoop, if we have not set number of reducers, then how many number of reducers will be created?

映射器的数量取决于(总数据大小)/(输入拆分大小),例如.如果数据大小为 1 TB,输入拆分大小为 100 MB.那么映射器的数量将是(1000*1000)/100 = 10000(一万).

Like number of mappers is dependent on (total data size)/(input split size), E.g. if data size is 1 TB and input split size is 100 MB. Then number of mappers will be (1000*1000)/100 = 10000(Ten thousand).

reducer 的数量取决于哪些因素?为一个作业创建了多少个 reducer?

The number of reducer is dependent on which factors ? How many reducers are created for a job?

推荐答案

减少了多少?(来自 官方文档)

reduce 的正确数量似乎是 0.95 或 1.75 乘以(节点数)*(每个节点的最大容器数).

对于 0.95,所有 reduce 都可以立即启动,并在地图完成时开始传输地图输出.在 1.75 中,更快的节点将完成第一轮 reduce 并启动第二波 reduce,从而更好地实现负载平衡.

With 0.95 all of the reduces can launch immediately and start transferring map outputs as the maps finish. With 1.75 the faster nodes will finish their first round of reduces and launch a second wave of reduces doing a much better job of load balancing.

增加reduce的数量会增加框架开销,但会增加负载平衡并降低故障成本.

Increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the cost of failures.

上述缩放因子略小于整数,以便在框架中为推测性任务和失败任务保留一些减少槽.

The scaling factors above are slightly less than whole numbers to reserve a few reduce slots in the framework for speculative-tasks and failed tasks.

本文也介绍了 Mapper 计数.

This article covers about Mapper count too.

有多少张地图?

map 的数量通常由输入的总大小驱动,即输入文件的总块数.

The number of maps is usually driven by the total size of the inputs, that is, the total number of blocks of the input files.

地图的正确并行度水平似乎是每个节点大约 10-100 个地图,尽管对于非常 cpu-light 的地图任务,它已设置为 300 个地图.任务设置需要一段时间,因此最好至少花费一分钟来执行地图.

The right level of parallelism for maps seems to be around 10-100 maps per-node, although it has been set up to 300 maps for very cpu-light map tasks. Task setup takes a while, so it is best if the maps take at least a minute to execute.

因此,如果您期望 10TB 的输入数据并且块大小为 128MB,那么您最终会得到 82,000 个地图,除非 Configuration.set(MRJobConfig.NUM_MAPS, int)(仅向框架提供提示)用于将其设置得更高.

Thus, if you expect 10TB of input data and have a blocksize of 128MB, you’ll end up with 82,000 maps, unless Configuration.set(MRJobConfig.NUM_MAPS, int) (which only provides a hint to the framework) is used to set it even higher.

如果要更改reducer 数量的默认值1,可以将以下属性(来自hadoop 2.x 版本)设置为命令行参数

If you want to change the default value of 1 for number of reducers, you can set below property (From hadoop 2.x version) as a command line parameter

ma​​preduce.job.reduces

您可以使用

job.setNumReduceTasks(integer_numer);

看看另一个相关的 SE 问题:什么是理想Hadoop 上的 reducer 数量?

Have a look at one more related SE question: What is Ideal number of reducers on Hadoop?

这篇关于默认减速器数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆