在运行简单MapReduce程序时获取java.lang.ClassCastException:类java.lang.String [英] getting java.lang.ClassCastException: class java.lang.String in running a simple MapReduce Program

查看:387
本文介绍了在运行简单MapReduce程序时获取java.lang.ClassCastException:类java.lang.String的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图执行一个简单的MapReduce程序,其中Map接受输入,将它分为两​​部分(key => String和value => Integer)
Reducer将相应键的值
我每次都得到ClassCastException。
我无法理解,代码中是什么导致了这个错误



我的代码:

  import java.io.IOException; 
import java.util.Iterator;

导入org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class Test {
public static class Map扩展MapReduceBase实现
Mapper< LongWritable,Text,String,Integer> {

@Override
public void map(LongWritable key,Text value,
OutputCollector< String,Integer>输出,Reporter记者)
抛出IOException {
String line = value.toString();
String [] lineParts = line.split(,);
output.collect(lineParts [0],Integer.parseInt(lineParts [1]));



$ b public static class Reduce extends MapReduceBase implements
Reducer< String,Integer,String,Integer> {

@Override
public void reduce(String key,Iterator< Integer> values,
OutputCollector< String,Integer>输出,Reporter记者)
抛出IOException {
int sum = 0;
while(values.hasNext()){
sum = sum + values.next();
}
output.collect(key,sum);



public static void main(String [] args)throws Exception {

JobConf conf = new JobConf(Test.class);
conf.setJobName(ProductCount);

conf.setMapOutputKeyClass(String.class);
conf.setMapOutputValueClass(Integer.class);

conf.setOutputKeyClass(String.class);
conf.setOutputValueClass(Integer.class);

conf.setMapperClass(Map.class);
conf.setReducerClass(Reduce.class);

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf,new Path(args [0]));
FileOutputFormat.setOutputPath(conf,new Path(args [1]));

JobClient.runJob(conf);


$ b

样本数据:

  abc,10 
abc,10
abc,10
def,9
def, 9

以下是堆栈跟踪。它与我的键值有什么关系?

  14/02/11 23:57:35信息mapred.JobClient :任务ID:attempt_201402110240_0013_m_000001_2,状态:FAILED 
java.lang.ClassCastException:类java.lang.String $ b在java.lang.Class.asSubclass处的b(Class.java:3018)$ b $在org .apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:795)
at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer。< init>(MapTask.java:816)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:382)
位于org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
位于org.apache。 hadoop.mapred.Child $ 4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java: 396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.mapred.Child.main(Child.java:262)


线程main中的异常java.io.IOException:J ob失败!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1246)
at Test.main(Test.java:69)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang .reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)


解决方案

在我看来,您好像对输出没有使用正确的类。

从MapReduce 教程中的一个:


键和值类必须由框架序列化,因此需要实现Writable接口。此外,关键类必须实现WritableComparable接口,以方便框架进行排序。因此,您应该用替换 String.class ,然后将 > Text.class Integer.class IntWritable.class



我希望能解决您的问题。



为什么我不能使用基本的String或Integer类?



Integer和String实现了Java的标准Serializable接口,如文档。问题在于MapReduce序列化/反序列化不使用此标准接口的值,而是使用自己的接口,称为可编写


那为什么他们不使用基本的Java接口呢?

简短的回答:因为它更高效。因为您已经定义了MapReduce代码中输入/输出的类型,所以Writer接口在序列化时省略了类型定义。因为你的代码已经知道会发生什么,而不是像这样序列化字符串:

 字符串:theStringItself

可以将其序列化为:

  theStringItself 

正如你所见,这样可以节省大量的内存。



长答案:阅读这个真棒 blog post


I am trying to execute a simple MapReduce program, wherein the Map takes the input, splits it in two parts(key=> String and value=>Integer) The reducer sums up the values for a corresponding key I am getting ClassCastException everytime. I am not able to understand, what in the code is causing this error

My Code:

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class Test {
public static class Map extends MapReduceBase implements
        Mapper<LongWritable, Text, String, Integer> {

    @Override
    public void map(LongWritable key, Text value,
            OutputCollector<String, Integer> output, Reporter reporter)
            throws IOException {
        String line = value.toString();
        String[] lineParts = line.split(",");
        output.collect(lineParts[0], Integer.parseInt(lineParts[1]));

    }
}

public static class Reduce extends MapReduceBase implements
        Reducer<String, Integer, String, Integer> {

    @Override
    public void reduce(String key, Iterator<Integer> values,
            OutputCollector<String, Integer> output, Reporter reporter)
            throws IOException {
        int sum = 0;
        while (values.hasNext()) {
            sum = sum + values.next();
        }
        output.collect(key, sum);
    }
}

public static void main(String[] args) throws Exception {

    JobConf conf = new JobConf(Test.class);
    conf.setJobName("ProductCount");

    conf.setMapOutputKeyClass(String.class);
    conf.setMapOutputValueClass(Integer.class);

    conf.setOutputKeyClass(String.class);
    conf.setOutputValueClass(Integer.class);

    conf.setMapperClass(Map.class);
    conf.setReducerClass(Reduce.class);

    conf.setInputFormat(TextInputFormat.class);
    conf.setOutputFormat(TextOutputFormat.class);

    FileInputFormat.setInputPaths(conf, new Path(args[0]));
    FileOutputFormat.setOutputPath(conf, new Path(args[1]));

    JobClient.runJob(conf);

}
}

Sample Data:

abc,10
abc,10
abc,10
def,9
def,9

Following is the stack trace. Does it have anything to do with my key-value?

14/02/11 23:57:35 INFO mapred.JobClient: Task Id : attempt_201402110240_0013_m_000001_2, Status : FAILED
java.lang.ClassCastException: class java.lang.String
at java.lang.Class.asSubclass(Class.java:3018)
at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:795)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:816)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:382)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.mapred.Child.main(Child.java:262)


Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1246)
at Test.main(Test.java:69)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)

解决方案

It seems to me as if you are not using the correct classes for the Output.

From one of the MapReduce Tutorials:

The key and value classes have to be serializable by the framework and hence need to implement the Writable interface. Additionally, the key classes have to implement the WritableComparable interface to facilitate sorting by the framework.

Therefore you should replace String.class with Text.class and Integer.class with IntWritable.class.

I hope that fixes your problem.

Why can't I use the basic String or Integer classes?

Integer and String implement the standard Serializable-interface of Java as seen in the docs. The problem is that MapReduce serializes/deserializes values not utilizing this standard interface but rather an own interface, which is called Writable.

So why don't they just use the basic Java Interface?

Short answer: Because it is more efficient. The Writable Interface omits the type definition when serializing, because you already define the types of the input/output in your MapReduce-code. As your code already knows what's coming, instead of serializing a String like this:

String: "theStringItself"

It could be serialized like:

theStringItself

As you can see this saves an enormous amount of memory.

Long answer: Read this awesome blog post.

这篇关于在运行简单MapReduce程序时获取java.lang.ClassCastException:类java.lang.String的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆