MapReduce的简单解释? [英] Simple explanation of MapReduce?

查看:96
本文介绍了MapReduce的简单解释?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

与我的 CouchDB 问题相关。



>任何人都可以用一个numbnuts解释MapReduce吗?

解决方案

一直到Map和Reduce的基础知识。






地图是一种将某种列表中的项目转换为另一种列表的功能类型的项目,并将它们放回到同一种列表中。



假设我有一个数字列表:[1,2,3],并且我想每个列表都加倍数字,在这种情况下,每一个数字加倍的函数是函数x = x * 2。并且没有映射,我可以写一个简单的循环,比如说
<$ p $ (a)中的项目A [项目] = A [项目] * 2
foreach / pre>

,我会有A = [2,4,6],而不是写循环,如果我有地图函数,我可以写

  A = [1,2,3] .Map(x => x * 2)

x => x * 2是针对[1,2,3]中的元素执行的函数。会发生什么情况是程序需要每个项目,通过使x等于每个项目来执行(x => x * 2),并产生结果列表。

  1:1 => 1 * 2:2 
2:2 => 2 * 2:4
3:3 => 3 * 2:6

在执行map函数(x => x * 2) 'd have [2,4,6]。




Reduce 是一个函数,收集列表中的项目并在它们的所有上执行一些计算,从而将它们减少到单个值。



找到总计或找到平均值都是减少函数的实例。例如,如果你有一个数字列表,比如[7,8,9],并且你希望他们总结出来,你可以写一个这样的循环

  A = [7,8,9] 
sum = 0
foreach(A中的项目)sum = sum + A [item]



$ b pre $ A = [7,8,9]
sum = A.reduce(0,(x,y)=> x + y)

现在有点困惑为什么有2个参数(0和带有x和y的函数)通过​​。为了使reduce函数有用,它必须能够获取2个项目,计算某些内容并将2个项目减少为一个单一值,因此程序可以减少每个项目,直到获得单个值。



执行如下:

  result = 0 
7:result =结果+7 = 0 + 7 = 7
8:结果=结果+8 = 7 + 8 = 15
9:结果=结果+9 = 15 + 9 = 24

但是你不想始终以零开始,所以第一个参数是让你指定一个种子值特别是第一个结果= 行中的值。



表示你想总结2个列表,它可能看起来像这:

  A = [7,8,9] 
B = [1,2,3] $ b (sum,(x,y)=> x + y)
sum = B.reduce(sum, + y)

或者您更可能在现实世界中找到的版本:

  A = [7,8,9] 
B = [1,2,3]

sum_func =(x,y)=> x + y
sum = A.reduce(B.reduce(0,sum_func),sum_func)



在数据库软件中它是一件好事,因为借助Map @ RedEdit支持,您可以使用数据库而无需知道数据如何存储在数据库中以便使用它是一个数据库引擎的功能。



你只需要能够通过提供Map或Reduce来引导引擎,函数,然后数据库引擎可以找到数据的方式,应用你的函数,然后拿出你想要的结果,而不必知道它是如何遍历所有记录的。



索引和键,连接和视图以及单个数据库可容纳的大量内容,因此通过屏蔽数据实际存储的方式,可以更轻松地编写和维护代码。



并行编程也是如此,如果您只指定要对数据执行的操作,而不是实际执行循环那么底层基础架构就可以并行化并为您同时执行并行循环。


Related to my CouchDB question.

Can anyone explain MapReduce in terms a numbnuts could understand?

解决方案

Going all the way down to the basics for Map and Reduce.


Map is a function which "transforms" items in some kind of list to another kind of item and put them back in the same kind of list.

suppose I have a list of numbers: [1,2,3] and I want to double every number, in this case, the function to "double every number" is function x = x * 2. And without mappings, I could write a simple loop, say

A = [1, 2, 3]
foreach (item in A) A[item] = A[item] * 2

and I'd have A = [2, 4, 6] but instead of writing loops, if I have a map function I could write

A = [1, 2, 3].Map(x => x * 2)

the x => x * 2 is a function to be executed against the elements in [1,2,3]. What happens is that the program takes each item, execute (x => x * 2) against it by making x equals to each item, and produce a list of the results.

1 : 1 => 1 * 2 : 2  
2 : 2 => 2 * 2 : 4  
3 : 3 => 3 * 2 : 6  

so after executing the map function with (x => x * 2) you'd have [2, 4, 6].


Reduce is a function which "collects" the items in lists and perform some computation on all of them, thus reducing them to a single value.

Finding a sum or finding averages are all instances of a reduce function. Such as if you have a list of numbers, say [7, 8, 9] and you want them summed up, you'd write a loop like this

A = [7, 8, 9]
sum = 0
foreach (item in A) sum = sum + A[item]

But, if you have access to a reduce function, you could write it like this

A = [7, 8, 9]
sum = A.reduce( 0, (x, y) => x + y )

Now it's a little confusing why there are 2 arguments (0 and the function with x and y) passed. For a reduce function to be useful, it must be able to take 2 items, compute something and "reduce" that 2 items to just one single value, thus the program could reduce each pair until we have a single value.

the execution would follows:

result = 0
7 : result = result + 7 = 0 + 7 = 7
8 : result = result + 8 = 7 + 8 = 15
9 : result = result + 9 = 15 + 9 = 24

But you don't want to start with zeroes all the time, so the first argument is there to let you specify a seed value specifically the value in the first result = line.

say you want to sum 2 lists, it might look like this:

A = [7, 8, 9]
B = [1, 2, 3]
sum = 0
sum = A.reduce( sum, (x, y) => x + y )
sum = B.reduce( sum, (x, y) => x + y )

or a version you'd more likely to find in the real world:

A = [7, 8, 9]
B = [1, 2, 3]

sum_func = (x, y) => x + y
sum = A.reduce( B.reduce( 0, sum_func ), sum_func )


Its a good thing in a DB software because, with Map\Reduce support you can work with the database without needing to know how the data are stored in a DB to use it, thats what a DB engine is for.

You just need to be able to "tell" the engine what you want by supplying them with either a Map or a Reduce function and then the DB engine could find its way around the data, apply your function, and come up with the results you want all without you knowing how it loops over all the records.

There are indexes and keys and joins and views and a lot of stuffs a single database could hold, so by shielding you against how the data is actually stored, your code are made easier to write and maintain.

Same goes for parallel programming, if you only specify what you want to do with the data instead of actually implementing the looping code, then the underlying infrastructure could "parallelize" and execute your function in a simultaneous parallel loop for you.

这篇关于MapReduce的简单解释?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆