两次调用MapReduce [英] Calling MapReduce Twice
问题描述
I'm following the word count tutorial here: https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Example:_WordCount_v1.0
然后我可以得出一个单词以这种格式出现的频率:
and I can produce how often a word appears in this format:
word frequency
1 1
2 2
3 3
4 1
5 2
6 1
但是,现在我需要对频率进行分组:
However, now I need to group the frequency like this:
frequency count
1 3
2 2
3 1
基本上,对于每个频率,找出出现频率.我将如何修改代码以显示此信息?我觉得我必须修改IntSumReducer
,但是我从未真正使用过Hadoop.
Basically, for each frequency, find out how often that appeared. How would I modify the code to show this? I feel like I have to modify IntSumReducer
but I've never really worked with Hadoop.
推荐答案
而不是从示例中修改SumReducer,您应该完全创建一个新的作业,该作业可以从字数统计程序的输出中获取收益.
Instead of modifying SumReducer from example, you should create new job altogether that works off of output of word count program.
您的Mapper将需要输出频率作为键,并输出整数1作为值.您可以编写自己的reducer或仅使用示例中使用的相同reducer.
Your Mapper will need to output frequency as key and integer 1 as value. You can write your own reducer or just use the same reducer used in example.
这篇关于两次调用MapReduce的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!