Hadoop一个Map和多个Reduce [英] Hadoop one Map and multiple Reduce

查看:103
本文介绍了Hadoop一个Map和多个Reduce的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个庞大的数据集,可以用多个减少函数进行分析。 所有减少算法适用于同一个 map 函数生成的相同数据集。读取大型数据集的成本太高,无法每次执行,因此最好只读取一次,并将映射的数据传递给多个 reduce 函数。



我可以使用Hadoop来做到这一点吗?我搜索了示例和intarweb,但找不到任何解决方案。

解决方案

您是否期望每个reducer都能够运行完全相同的映射数据?但至少关键应该是不同的,因为它决定了哪个reducer要去。



您可以在mapper中多次输出一个输出,并以key的形式输出$ i是第i个缩减器,而$ key是您的原始键)。并且您需要添加一个分区程序以确保这些记录是基于$ i分布在还原器中的。然后使用GroupingComparator按原始$ key对记录进行分组。



有可能做到这一点,但在一个MR中并不重要。


We have a large dataset to analyze with multiple reduce functions.

All reduce algorithm work on the same dataset generated by the same map function. Reading the large dataset costs too much to do it every time, it would be better to read only once and pass the mapped data to multiple reduce functions.

Can I do this with Hadoop? I've searched the examples and the intarweb but I could not find any solutions.

解决方案

Are you expecting every reducer to work on exactly same mapped data? But at least the "key" should be different since it decides which reducer to go.

You can write an output for multiple times in mapper, and output as key (where $i is for the i-th reducer, and $key is your original key). And you need to add a "Partitioner" to make sure these n records are distributed in reducers, based on $i. Then using "GroupingComparator" to group records by original $key.

It's possible to do that, but not in trivial way in one MR.

这篇关于Hadoop一个Map和多个Reduce的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆