Hadoop:java.lang.ClassCastException:org.apache.hadoop.io.LongWritable无法转换为org.apache.hadoop.io.Text [英] Hadoop : java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text

查看:167
本文介绍了Hadoop:java.lang.ClassCastException:org.apache.hadoop.io.LongWritable无法转换为org.apache.hadoop.io.Text的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的程序看起来像

My program looks like

public class TopKRecord extends Configured implements Tool {

    public static class MapClass extends Mapper<Text, Text, Text, Text> {

        public void map(Text key, Text value, Context context) throws IOException, InterruptedException {
            // your map code goes here
            String[] fields = value.toString().split(",");
            String year = fields[1];
            String claims = fields[8];

            if (claims.length() > 0 && (!claims.startsWith("\""))) {
                context.write(new Text(year.toString()), new Text(claims.toString()));
            }
        }
    }
   public int run(String args[]) throws Exception {
        Job job = new Job();
        job.setJarByClass(TopKRecord.class);

        job.setMapperClass(MapClass.class);

        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.setJobName("TopKRecord");
        job.setMapOutputValueClass(Text.class);
        job.setNumReduceTasks(0);
        boolean success = job.waitForCompletion(true);
        return success ? 0 : 1;
    }

    public static void main(String args[]) throws Exception {
        int ret = ToolRunner.run(new TopKRecord(), args);
        System.exit(ret);
    }
}

数据看起来像

The data looks like

"PATENT","GYEAR","GDATE","APPYEAR","COUNTRY","POSTATE","ASSIGNEE","ASSCODE","CLAIMS","NCLASS","CAT","SUBCAT","CMADE","CRECEIVE","RATIOCIT","GENERAL","ORIGINAL","FWDAPLAG","BCKGTLAG","SELFCTUB","SELFCTLB","SECDUPBD","SECDLWBD"
3070801,1963,1096,,"BE","",,1,,269,6,69,,1,,0,,,,,,,
3070802,1963,1096,,"US","TX",,1,,2,6,63,,0,,,,,,,,,
3070803,1963,1096,,"US","IL",,1,,2,6,63,,9,,0.3704,,,,,,,
3070804,1963,1096,,"US","OH",,1,,2,6,63,,3,,0.6667,,,,,,,

在运行这个程序时,我看到以下在控制台上:

On running this program I see the following on console

12/08/02 12:43:34 INFO mapred.JobClient: Task Id : attempt_201208021025_0007_m_000000_0, Status : FAILED
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text
    at com.hadoop.programs.TopKRecord$MapClass.map(TopKRecord.java:26)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

我相信类类型正确映射,
Class Mapper

I believe that the Class Types are mapped correctly, Class Mapper,

请让我知道我在这里做错了什么?

Please let me know what is that I am doing wrong here?

推荐答案

使用M / R程序读取文件时,映射器的输入键应该是行中的索引该文件,而输入值将是整行。

When you read a file with a M/R program, the input key of your mapper should be the index of the line in the file, while the input value will be the full line.

所以这里发生的事情是,你试图让行索引为 Text 这个对象是错误的,你需要一个 LongWritable 来代替,这样Hadoop不会抱怨类型。

So here what's happening is that you're trying to have the line index as a Text object which is wrong, and you need an LongWritable instead so that Hadoop doesn't complain about type.

试试这个:

Try this instead:

public class TopKRecord extends Configured implements Tool {

    public static class MapClass extends Mapper<LongWritable, Text, Text, Text> {

        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            // your map code goes here
            String[] fields = value.toString().split(",");
            String year = fields[1];
            String claims = fields[8];

            if (claims.length() > 0 && (!claims.startsWith("\""))) {
                context.write(new Text(year.toString()), new Text(claims.toString()));
            }
        }
    }

    ...
}

你的代码中还有一件你可能想重新考虑的事情,你正在创建2 Text >对象,你只能在开头创建这两个对象,然后在你的映射器中通过使用 set 方法。如果您处理的数据量很大,这将为您节省大量时间。

Also one thing in your code that you might want to reconsider, you're creating 2 Text objects for every record you're processing. You should only create these 2 objects right at the beginning, and then in your mapper just set their values by using the set method. This will save you a lot of time if you're processing a decent amount of data.

这篇关于Hadoop:java.lang.ClassCastException:org.apache.hadoop.io.LongWritable无法转换为org.apache.hadoop.io.Text的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆