Hadoop MapReduce中的排序和混洗优化 [英] Sort and shuffle optimization in Hadoop MapReduce

查看:112
本文介绍了Hadoop MapReduce中的排序和混洗优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个基于Hadoop的基于研究/实现的项目,并且我发现了在wiki页面上发布的列表 - http://wiki.apache.org/hadoop/ProjectSuggestions 。但是,这个页面最后在2009年9月更新。所以,我不确定这些想法是否已经实施。我对MR框架中的排序和混洗优化特别感兴趣,它谈到在混洗之前结合机架或节点上的几个映射的结果,这可以减少搜索工作和中间存储。

有没有人试过这个?这是在当前版本的Hadoop中实现的吗?

解决方案

项目描述旨在优化。
这个特性已经存在于当前的Hadoop-MapReduce中,它可能运行的时间要少很多。
听起来对我来说是一种宝贵的提升。

I'm looking for a research/implementation based project on Hadoop and I came across the list posted on the wiki page - http://wiki.apache.org/hadoop/ProjectSuggestions. But, this page was last updated in September, 2009. So, I'm not sure if some of these ideas have already been implemented or not. I was particularly interested in "Sort and Shuffle optimization in the MR framework" which talks about "combining the results of several maps on rack or node before the shuffle. This can reduce seek work and intermediate storage".

Has anyone tried this before? Is this implemented in the current version of Hadoop?

解决方案

The project description is aimed "optimization". This feature is already present in the current Hadoop-MapReduce and it can probably run in a lot less time. Sounds like a valuable enhancement to me.

这篇关于Hadoop MapReduce中的排序和混洗优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆