mapreduce计数示例 [英] mapreduce count example

查看:74
本文介绍了mapreduce计数示例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题是关于 java中的mapreduce编程

假设我有WordCount.java示例,一个标准的 mapreduce程序。我希望map函数收集一些信息,然后返回到如下形式的reduce函数映射:< slaveNode_id,some_info_collected>



使得我可以知道从节点收集了哪些数据 ..任何想法如何?

  public class WordCount {

public static class Map扩展MapReduceBase实现Mapper< LongWritable,Text,Text ,IntWritable> {
private static static IntWritable one = new IntWritable(1);
私人文字=新文字();
$ b $ public void map(LongWritable key,Text value,OutputCollector< Text,IntWritable> output,Reporter reporter)throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while(tokenizer.hasMoreTokens()){
word.set(tokenizer.nextToken());
output.collect(word,one);



$ b public static class Reduce extends MapReduceBase implements Reducer< Text,IntWritable,Text,IntWritable> {
public void reduce(Text key,Iterator< IntWritable> values,OutputCollector< Text,IntWritable> output,Reporter reporter)throws IOException {
int sum = 0;
while(values.hasNext()){
sum + = values.next()。get();
}
output.collect(key,new IntWritable(sum));



public static void main(String [] args)throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName(wordcount);

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf,new Path(args [0]));
FileOutputFormat.setOutputPath(conf,new Path(args [1]));

JobClient.runJob(conf);


$ / code>

谢谢!!

解决方案

您要求的是让应用程序(您的map-reduce thingy)了解它所运行的基础架构。



一般来说,答案是您的应用程序不需要这些信息。对映射器的每次调用和对Reducer的每次调用都可以在不同的节点上执行,也可以在同一节点上执行。 MapReduce的优点在于结果是一样的,因此对于您的应用程序来说:无关紧要。



因此,API没有支持的功能






b
$ b 有趣的是学习Hadoop: p> PS我能想到的唯一方法(至少可以说是令人讨厌的)是你在Mapper中包含了某种类型的系统调用,并询问底层操作系统是否有名称/属性等。
这种构造会使你的应用程序非常不便携;即它不能在Windows或亚马逊的Hadoop上运行。


My question is about mapreduce programming in java.

Suppose I have the WordCount.java example, a standard mapreduce program. I want the map function to collect some information, and return to the reduce function maps formed like: <slaveNode_id,some_info_collected>,

so that I can know what slave node collected what data.. Any idea how??

public class WordCount {

    public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
      private final static IntWritable one = new IntWritable(1);
      private Text word = new Text();

      public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
          word.set(tokenizer.nextToken());
          output.collect(word, one);
        }
      }
    }

    public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
      public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
        int sum = 0;
        while (values.hasNext()) {
          sum += values.next().get();
        }
        output.collect(key, new IntWritable(sum));
      }
    }

    public static void main(String[] args) throws Exception {
      JobConf conf = new JobConf(WordCount.class);
      conf.setJobName("wordcount");

      conf.setOutputKeyClass(Text.class);
      conf.setOutputValueClass(IntWritable.class);

      conf.setMapperClass(Map.class);
      conf.setCombinerClass(Reduce.class);
      conf.setReducerClass(Reduce.class);

      conf.setInputFormat(TextInputFormat.class);
      conf.setOutputFormat(TextOutputFormat.class);

      FileInputFormat.setInputPaths(conf, new Path(args[0]));
      FileOutputFormat.setOutputPath(conf, new Path(args[1]));

      JobClient.runJob(conf);
    }
}

Thank you!!

解决方案

What you are asking is to let the application (your map-reduce thingy) know about the infrastructure it ran on.

In general the answer is that your application doesn't need this information. Each call to the Mapper and each call to the Reducer can be executed on a different node or all on the same node. The beauty of MapReduce is that the outcome is the same, so for your application: it doesn't matter.

As a consequence the API don't have features to support this request of yours.

Have fun learning Hadoop :)


P.S. The only way I can think of (which is nasty to say the least) is that you include a system call of some sort in the Mapper and ask the underlying OS about it's name/properties/etc. This kind of construct would make your application very non-portable; i.e. it won't run on Hadoop in Windows or Amazon.

这篇关于mapreduce计数示例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆