使用 Hadoop MapReduce 排序字数 [英] Sorted word count using Hadoop MapReduce

查看:35
本文介绍了使用 Hadoop MapReduce 排序字数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 MapReduce 非常陌生,我完成了一个 Hadoop 字数统计示例.

I'm very much new to MapReduce and I completed a Hadoop word-count example.

在该示例中,它生成字数统计的未排序文件(带有键值对).那么是否可以通过将另一个 MapReduce 任务与之前的任务结合起来,按单词出现次数对其进行排序?

In that example it produces unsorted file (with key-value pairs) of word counts. So is it possible to sort it by number of word occurrences by combining another MapReduce task with the earlier one?

推荐答案

在简单的 word count map reduce 程序中,我们得到的输出是按单词排序的.示例输出可以是:
苹果 1
男孩 30
猫 2
青蛙20
斑马1
如果您希望根据单词的出现次数对输出进行排序,即采用以下格式
1个苹果
1 斑马
2 猫
20 青蛙
30 男孩
您可以使用下面的映射器和化简器创建另一个 MR 程序,其中输入将是从简单字数计算程序获得的输出.

In simple word count map reduce program the output we get is sorted by words. Sample output can be :
Apple 1
Boy 30
Cat 2
Frog 20
Zebra 1
If you want output to be sorted on the basis of number of occrance of words, i.e in below format
1 Apple
1 Zebra
2 Cat
20 Frog
30 Boy
You can create another MR program using below mapper and reducer where the input will be the output got from simple word count program.

class Map1 extends MapReduceBase implements Mapper<Object, Text, IntWritable, Text>
{
    public void map(Object key, Text value, OutputCollector<IntWritable, Text> collector, Reporter arg3) throws IOException 
    {
        String line = value.toString();
        StringTokenizer stringTokenizer = new StringTokenizer(line);
        {
            int number = 999; 
            String word = "empty";

            if(stringTokenizer.hasMoreTokens())
            {
                String str0= stringTokenizer.nextToken();
                word = str0.trim();
            }

            if(stringTokenizer.hasMoreElements())
            {
                String str1 = stringTokenizer.nextToken();
                number = Integer.parseInt(str1.trim());
            }

            collector.collect(new IntWritable(number), new Text(word));
        }

    }

}


class Reduce1 extends MapReduceBase implements Reducer<IntWritable, Text, IntWritable, Text>
{
    public void reduce(IntWritable key, Iterator<Text> values, OutputCollector<IntWritable, Text> arg2, Reporter arg3) throws IOException
    {
        while((values.hasNext()))
        {
            arg2.collect(key, values.next());
        }

    }

}

这篇关于使用 Hadoop MapReduce 排序字数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆