Hadoop:java.lang.ClassCastException:org.apache.hadoop.io.LongWritable无法转换为org.apache.hadoop.io.Text [英] Hadoop : java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text
问题描述
我的程序看起来像
My program looks like
public class TopKRecord extends Configured implements Tool {
public static class MapClass extends Mapper<Text, Text, Text, Text> {
public void map(Text key, Text value, Context context) throws IOException, InterruptedException {
// your map code goes here
String[] fields = value.toString().split(",");
String year = fields[1];
String claims = fields[8];
if (claims.length() > 0 && (!claims.startsWith("\""))) {
context.write(new Text(year.toString()), new Text(claims.toString()));
}
}
}
public int run(String args[]) throws Exception {
Job job = new Job();
job.setJarByClass(TopKRecord.class);
job.setMapperClass(MapClass.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setJobName("TopKRecord");
job.setMapOutputValueClass(Text.class);
job.setNumReduceTasks(0);
boolean success = job.waitForCompletion(true);
return success ? 0 : 1;
}
public static void main(String args[]) throws Exception {
int ret = ToolRunner.run(new TopKRecord(), args);
System.exit(ret);
}
}
数据看起来像
The data looks like
"PATENT","GYEAR","GDATE","APPYEAR","COUNTRY","POSTATE","ASSIGNEE","ASSCODE","CLAIMS","NCLASS","CAT","SUBCAT","CMADE","CRECEIVE","RATIOCIT","GENERAL","ORIGINAL","FWDAPLAG","BCKGTLAG","SELFCTUB","SELFCTLB","SECDUPBD","SECDLWBD"
3070801,1963,1096,,"BE","",,1,,269,6,69,,1,,0,,,,,,,
3070802,1963,1096,,"US","TX",,1,,2,6,63,,0,,,,,,,,,
3070803,1963,1096,,"US","IL",,1,,2,6,63,,9,,0.3704,,,,,,,
3070804,1963,1096,,"US","OH",,1,,2,6,63,,3,,0.6667,,,,,,,
在运行这个程序时,我看到以下在控制台上:
On running this program I see the following on console
12/08/02 12:43:34 INFO mapred.JobClient: Task Id : attempt_201208021025_0007_m_000000_0, Status : FAILED
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text
at com.hadoop.programs.TopKRecord$MapClass.map(TopKRecord.java:26)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
我相信类类型正确映射,
Class Mapper ,
I believe that the Class Types are mapped correctly, Class Mapper,
请让我知道我在这里做错了什么?
Please let me know what is that I am doing wrong here?
推荐答案
使用M / R程序读取文件时,映射器的输入键应该是行中的索引该文件,而输入值将是整行。
When you read a file with a M/R program, the input key of your mapper should be the index of the line in the file, while the input value will be the full line.
所以这里发生的事情是,你试图让行索引为 Text
这个对象是错误的,你需要一个 LongWritable
来代替,这样Hadoop不会抱怨类型。
So here what's happening is that you're trying to have the line index as a Text
object which is wrong, and you need an LongWritable
instead so that Hadoop doesn't complain about type.
试试这个:
Try this instead:
public class TopKRecord extends Configured implements Tool {
public static class MapClass extends Mapper<LongWritable, Text, Text, Text> {
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
// your map code goes here
String[] fields = value.toString().split(",");
String year = fields[1];
String claims = fields[8];
if (claims.length() > 0 && (!claims.startsWith("\""))) {
context.write(new Text(year.toString()), new Text(claims.toString()));
}
}
}
...
}
你的代码中还有一件你可能想重新考虑的事情,你正在创建2 Text
>对象,你只能在开头创建这两个对象,然后在你的映射器中通过使用 set
方法。如果您处理的数据量很大,这将为您节省大量时间。
Also one thing in your code that you might want to reconsider, you're creating 2 Text
objects for every record you're processing. You should only create these 2 objects right at the beginning, and then in your mapper just set their values by using the set
method. This will save you a lot of time if you're processing a decent amount of data.
这篇关于Hadoop:java.lang.ClassCastException:org.apache.hadoop.io.LongWritable无法转换为org.apache.hadoop.io.Text的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!