Hadoop映射器和reducer输出不匹配 [英] Hadoop mapper and reducer output mismatch

查看:128
本文介绍了Hadoop映射器和reducer输出不匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用
setMapOutputKeyClass setMapOutputValueClass 来配置不同的映射器和减速器输出类型, setMapKeyClass
setMapValueClass 。但是,即使在我调用这些函数后,运行时仍然会收到
的错误消息。

I am trying to configure different mapper and reducer output type by using setMapOutputKeyClass, setMapOutputValueClass, setMapKeyClass, and setMapValueClass. However, even after I called these functions, I still get error messages during the runtime.

以下是我的代码:

Here is my code:

package org.myorg;

import java.io.IOException;
import java.util.*;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;

public class Sort {

    public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
      private final static IntWritable one = new IntWritable(1);
      private Text word = new Text();

      public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
          word.set(tokenizer.nextToken());
          output.collect(word, one);
        }
      }
    }

    public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, LongWritable> {
      public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, LongWritable> output, Reporter reporter) throws IOException {
        int sum = 0;
        while (values.hasNext()) {
          sum += values.next().get();
        }
        output.collect(key, new LongWritable(sum));
      }
    }

    public static void main(String[] args) throws Exception {
      JobConf conf = new JobConf(Sort.class);
      conf.setJobName("sort");

      conf.setMapperClass(Map.class);
      conf.setCombinerClass(Reduce.class);
      conf.setReducerClass(Reduce.class);

      conf.setInputFormat(TextInputFormat.class);
      conf.setOutputFormat(TextOutputFormat.class);

      FileInputFormat.setInputPaths(conf, new Path(args[0]));
      FileOutputFormat.setOutputPath(conf, new Path(args[1]));

      conf.setMapOutputKeyClass(Text.class);
      conf.setMapOutputValueClass(IntWritable.class);
      conf.setOutputKeyClass(Text.class);
      conf.setOutputValueClass(LongWritable.class);

      JobClient.runJob(conf);
    }
}

我得到的错误信息:

java.lang.Exception: java.io.IOException: wrong value class: class org.apache.hadoop.io.LongWritable is not class org.apache.hadoop.io.IntWritable
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.io.IOException: wrong value class: class org.apache.hadoop.io.LongWritable is not class org.apache.hadoop.io.IntWritable
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:168)
at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1160)
at org.myorg.Sort$Reduce.reduce(Sort.java:34)
at org.myorg.Sort$Reduce.reduce(Sort.java:28)
at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1436)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1441)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1303)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:431)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
13/10/12 14:08:11 INFO mapred.JobClient:  map 0% reduce 0%
13/10/12 14:08:11 INFO mapred.JobClient: Job complete: job_local599611407_0001
13/10/12 14:08:11 INFO mapred.JobClient: Counters: 0
13/10/12 14:08:11 INFO mapred.JobClient: Job Failed: NA
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.myorg.Sort.main(Sort.java:57)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

我做错了什么?感谢您的帮助!

Did I do anything wrong? Thanks for your help!

推荐答案

注释掉下面的行,程序应该可以工作。 这里是对问题的解释。

Comment out the below line and the program should work. Here is an explanation on what the problem is.

conf.setCombinerClass(Reduce.class);

另一种编写输入和输出类型相同的减速器的解决方案。如果reducer类也可以用作组合器类。

Another solution to write a reducer whose input and output types are same. In the case the reducer class can also be used as the combiner class.

这篇关于Hadoop映射器和reducer输出不匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆