使用Hadoop MapReduce对字数进行排序 [英] Sorted word count using Hadoop MapReduce

查看:217
本文介绍了使用Hadoop MapReduce对字数进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对MapReduce非常陌生,并且完成了一个Hadoop的word-count示例。

在这个例子中,它生成未分类的文件(带有键值对)的字数。那么,是否可以通过将另一个MapReduce任务与前一个MapReduce任务相结合来对它进行排序呢?

解析方案

计数映射减少程序我们得到的输出是按字排序的。示例输出可以是:

Apple 1

Boy 30

Cat 2

Frog 20

Zebra 1

如果您希望根据单词的封锁次数对输出进行排序,即以下格式

1 Apple

1 Zebra

2 Cat

20 Frog

30 Boy

您可以使用下面的mapper和reducer创建另一个MR程序,其中输入将是从简单的单词计数程序。


 类Map1扩展MapReduceBase实现Mapper< Object,Text,IntWritable,Text> 
public void map(Object key,Text value,OutputCollector< IntWritable,Text> collector,Reporter arg3)throws IOException
{
String line = value.toString();
StringTokenizer stringTokenizer = new StringTokenizer(line);
{
int number = 999;
String word =empty;

if(stringTokenizer.hasMoreTokens())
{
String str0 = stringTokenizer.nextToken();
word = str0.trim();


if(stringTokenizer.hasMoreElements())
{
String str1 = stringTokenizer.nextToken();
number = Integer.parseInt(str1.trim());
}

collector.collect(new IntWritable(number),new Text(word));
}

}

}


class Reduce1扩展MapReduceBase实现Reducer< IntWritable,Text,IntWritable,Text> ((values.hasNext))抛出IOException


{
public void reduce(IntWritable key,Iterator< Text> values,OutputCollector< IntWritable,Text> arg2,Reporter arg3) ()))
{
arg2.collect(key,values.next());
}

}

}


I'm very much new to MapReduce and I completed a Hadoop word-count example.

In that example it produces unsorted file (with key-value pairs) of word counts. So is it possible to sort it by number of word occurrences by combining another MapReduce task with the earlier one?

解决方案

In simple word count map reduce program the output we get is sorted by words. Sample output can be :
Apple 1
Boy 30
Cat 2
Frog 20
Zebra 1
If you want output to be sorted on the basis of number of occrance of words, i.e in below format
1 Apple
1 Zebra
2 Cat
20 Frog
30 Boy
You can create another MR program using below mapper and reducer where the input will be the output got from simple word count program.

class Map1 extends MapReduceBase implements Mapper<Object, Text, IntWritable, Text>
{
    public void map(Object key, Text value, OutputCollector<IntWritable, Text> collector, Reporter arg3) throws IOException 
    {
        String line = value.toString();
        StringTokenizer stringTokenizer = new StringTokenizer(line);
        {
            int number = 999; 
            String word = "empty";

            if(stringTokenizer.hasMoreTokens())
            {
                String str0= stringTokenizer.nextToken();
                word = str0.trim();
            }

            if(stringTokenizer.hasMoreElements())
            {
                String str1 = stringTokenizer.nextToken();
                number = Integer.parseInt(str1.trim());
            }

            collector.collect(new IntWritable(number), new Text(word));
        }

    }

}


class Reduce1 extends MapReduceBase implements Reducer<IntWritable, Text, IntWritable, Text>
{
    public void reduce(IntWritable key, Iterator<Text> values, OutputCollector<IntWritable, Text> arg2, Reporter arg3) throws IOException
    {
        while((values.hasNext()))
        {
            arg2.collect(key, values.next());
        }

    }

}

这篇关于使用Hadoop MapReduce对字数进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆