如何马云preduce排序算法的工作? [英] How does the MapReduce sort algorithm work?

查看:157
本文介绍了如何马云preduce排序算法的工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

其中之一就是用于展示的马preduce电源的主要例子是<一个href="http://developer.yahoo.net/blogs/hadoop/2008/07/apache%5Fhadoop%5Fwins%5Fterabyte%5Fsort%5Fbenchmark.html">Terasort基准。我无法理解,在马preduce环境中使用的排序算法的基本知识。

One of the main examples that is used in demonstrating the power of MapReduce is the Terasort benchmark. I'm having trouble understanding the basics of the sorting algorithm used in the MapReduce environment.

要我简单地排序包括确定在相对于所有其他元素的元素的相对位置。所以排序涉及比较一切与一切。你平均的排序算法(快速,泡沫,...)简单地做这一个聪明的办法。

To me sorting simply involves determining the relative position of an element in relationship to all other elements. So sorting involves comparing "everything" with "everything". Your average sorting algorithm (quick, bubble, ...) simply does this in a smart way.

在我的脑海里分裂的数据集成许多碎片意味着你可以按单件,然后你还是要这些作品融入完整的完全分类的数据集。考虑到TB级数据集分布在数千我希望这是一项艰巨的任务系统。

In my mind splitting the dataset into many pieces means you can sort a single piece and then you still have to integrate these pieces into the 'complete' fully sorted dataset. Given the terabyte dataset distributed over thousands of systems I expect this to be a huge task.

那么这是怎么真的做?请问这个马preduce排序算法的工作?

So how is this really done? How does this MapReduce sorting algorithm work?

感谢您帮助我明白了。

推荐答案

这里是 Hadoop的实现Terasort <一些细节/一>:

Here are some details on Hadoop's implementation for Terasort:

TeraSort是一个标准的map / reduce排序,除了使用N个排序列表自定义分区 - 1采样键定义键的范围为各减少。特别是,所有按键,使得样本[我 - 1] =键&LT;样品[I]被发送到减少岛这保证了输出减少我比输出均不减少I + 1。

TeraSort is a standard map/reduce sort, except for a custom partitioner that uses a sorted list of N − 1 sampled keys that define the key range for each reduce. In particular, all keys such that sample[i − 1] <= key < sample[i] are sent to reduce i. This guarantees that the output of reduce i are all less than the output of reduce i+1."

所以,他们的诀窍是他们的方式在地图上阶段确定键。从本质上讲,他们保证在一个减速的每个值保证是'pre排序对所有其他的减速。

So their trick is in the way they determine the keys during the map phase. Essentially they ensure that every value in a single reducer is guaranteed to be 'pre-sorted' against all other reducers.

我发现通过詹姆斯·汉密尔顿的博客文章论文引用。

这篇关于如何马云preduce排序算法的工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆