两次调用MapReduce [英] Calling MapReduce Twice

查看:219
本文介绍了两次调用MapReduce的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在这里关注字数统计教程:

I'm following the word count tutorial here: https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Example:_WordCount_v1.0

然后我可以得出一个单词以这种格式出现的频率:

and I can produce how often a word appears in this format:

word frequency
1    1
2    2
3    3
4    1
5    2
6    1

但是,现在我需要对频率进行分组:

However, now I need to group the frequency like this:

frequency   count
1           3
2           2
3           1

基本上,对于每个频率,找出出现频率.我将如何修改代码以显示此信息?我觉得我必须修改IntSumReducer,但是我从未真正使用过Hadoop.

Basically, for each frequency, find out how often that appeared. How would I modify the code to show this? I feel like I have to modify IntSumReducer but I've never really worked with Hadoop.

推荐答案

而不是从示例中修改SumReducer,您应该完全创建一个新的作业,该作业可以从字数统计程序的输出中获取收益.

Instead of modifying SumReducer from example, you should create new job altogether that works off of output of word count program.

您的Mapper将需要输出频率作为键,并输出整数1作为值.您可以编写自己的reducer或仅使用示例中使用的相同reducer.

Your Mapper will need to output frequency as key and integer 1 as value. You can write your own reducer or just use the same reducer used in example.

这篇关于两次调用MapReduce的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆