维基百科对Map Reduce的reduce的解释不正确吗? [英] Is Wikipedia's explanation of Map Reduce's reduce incorrect?

查看:68
本文介绍了维基百科对Map Reduce的reduce的解释不正确吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

MongoDB对reduce阶段的解释说:

map/reduce引擎可以迭代地调用reduce函数;因此, 这些功能必须是幂等的.

The map/reduce engine may invoke reduce functions iteratively; thus, these functions must be idempotent.

这就是我一直了解的减少方式,以便在一般的地图减少环境中工作. 在这里,您可以通过减少每台机器上的值,然后将这些输出发送到另一台减速器,来对N台机器上的值求和.

This is how I always understood reduce to work in a general map reduce environment. Here you could sum values across N machines by reducing the values on each machine, then sending those outputs to another reducer.

维基百科说:

框架分别为应用程序调用一次Reduce函数 排序顺序中的唯一键. Reduce可以遍历 与该键相关联且产生零个或多个的值 输出.

The framework calls the application's Reduce function once for each unique key in the sorted order. The Reduce can iterate through the values that are associated with that key and produce zero or more outputs.

在这里,您需要将所有值(使用相同的键)移动到同一台机器上进行求和.将数据移动到函数似乎与map reduce应该做的相反.

Here you would need to move all values (with the same key) to the same machine to be summed. Moving data to the function seems to be the opposite of what map reduce is supposed to do.

维基百科的描述是否太具体?还是MongoDB打破了map-reduce? (或者我在这里想念某人吗?)

Is Wikipedia's description too specific? Or did MongoDB break map-reduce? (Or am I missing somethieng here?)

推荐答案

这就是原始Map Reduce框架

This is how the original Map Reduce framework was described by Google:

2编程模型

[...]

2 Programming Model

[...]

中间值通过迭代器提供给用户的reduce函数.这样,我们就可以处理太大而无法容纳在内存中的值列表.

The intermediate values are supplied to the user’s reduce function via an iterator. This allows us to handle lists of values that are too large to fit in memory.

后来:

3实施

[...]

3 Implementation

[...]

6. reduce工作者迭代排序后的中间数据,并且对于遇到的每个唯一中间键,它将键和相应的中间值集传递给用户的Reduce函数.

6. The reduce worker iterates over the sorted intermediate data and for each unique intermediate key encountered, it passes the key and the corresponding set of intermediate values to the user’s Reduce function.

因此,只有一个Reduce调用.通过在本地使用特殊的 combiner 函数解决了移动许多小型中间对的问题:

So there is only one invocation of Reduce. The problem of moving a lot of small intermediate pairs is addressed by using special combiner function locally:

4.3组合器功能

在某些情况下,每个映射任务产生的中间键都有大量重复.[...]我们允许用户指定一个可选的Combiner函数,该函数在通过s发送之前将数据进行部分合并.网络.

4.3 Combiner Function

In some cases, there is significant repetition in the intermediate keys produced by each map task [...] We allow the user to specify an optional Combiner function that does partial merging of this data before it is sent over the network.

Combiner功能在执行映射任务的每台机器上执行.通常,使用相同的代码来实现组合器和reduce函数. [...]

The Combiner function is executed on each machine that performs a map task. Typically the same code is used to implement both the combiner and the reduce functions. [...]

部分组合可显着加快某些MapReduce操作的速度.

Partial combining significantly speeds up certain classes of MapReduce operations.

TL; DR

Wikipedia遵循原始的MapReduce设计,MongoDB设计师采用了略有不同的方法.

TL;DR

Wikipedia follows original MapReduce design, MongoDB designers taken a slightly different approach.

这篇关于维基百科对Map Reduce的reduce的解释不正确吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆