MapReduce 中 1 个任务的 reducer 数量 [英] number of reducers for 1 task in MapReduce

查看:14
本文介绍了MapReduce 中 1 个任务的 reducer 数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在典型的 MapReduce 设置(如 Hadoop)中,有多少个 reducer 用于 1 个任务,例如计算单词?我对 Google 的 MapReduce 的理解意味着只涉及 1 个 reducer.对吗?

In a typical MapReduce setup(like Hadoop), how many reducer is used for 1 task, for example, counting words? My understanding of that MapReduce from Google means only 1 reducer is involved. Is that correct?

例如,单词计数会将输入分成 N 个块,然后运行 ​​N 个 Map,生成 (word,#) 列表.我的问题是,一旦 Map 阶段完成,是否只有一个 reducer 实例运行来计算结果?还是会有减速器并行运行?

For example, the word count will divide the input into N chunks, and N Map will be running, producing the (word,#) list. My question is, once the Map phase is done, will there be only ONE reducer instance running to compute the result? or there will be reducers running in parallel?

推荐答案

简单的回答是,reducer 的数量不必是 1,是的,reducer 可以并行运行.正如我上面提到的,这是用户定义或派生的.

The simple answer is that the number of reducers does not have to be 1 and yes, reducers can run in parallel. As I mentioned above this is user defined or derived.

为了让事情保持在上下文中,在这种情况下我将参考 Hadoop,以便您了解事情的工作原理.如果您在 Hadoop (0.20.2) 中使用流式 API,则必须明确定义要运行的 reducer 数量,因为默认情况下,只会启动 1 个 reduce 任务.您可以通过将 reducer 的数量传递给 -D mapred.reduce.tasks=# of reducers 参数来实现.Java API 将尝试派生您需要的 reducer 数量,但您也可以显式设置它.在这两种情况下,每个节点可以运行的 reducer 的数量都有一个硬性上限,这是使用 mapred.tasktracker.reduce.tasks 在 mapred-site.xml 配置文件中设置的.maxum.

To keep things in context I will refer to Hadoop in this case so you have an idea of how things work. If you are using the streaming API in Hadoop (0.20.2) you will have to explicitly define how many reducers you would like to run since by default, only 1 reduce task will be launched. You do so by passing the number of reducers to the -D mapred.reduce.tasks=# of reducers argument. The Java API will try to derive the number of reducers you will need but again you can explicitly set that too. In both cases, there is a hard cap on the number of reducers you can run per node and that is set in your mapred-site.xml configuration file using mapred.tasktracker.reduce.tasks.maximum.

关于更概念性的说明,您可以查看 hadoop wiki 上关于选择数量map 和 reduce 任务.

On a more conceptual note, you can look at this post on the hadoop wiki that talks about choosing the number of map and reduce tasks.

这篇关于MapReduce 中 1 个任务的 reducer 数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆