如何解决预期的org.apache.hadoop.io.Text,收到org.apache.hadoop.io.LongWritable在mapreduce工作中 [英] How to solve expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.LongWritable in mapreduce job

查看:263
本文介绍了如何解决预期的org.apache.hadoop.io.Text,收到org.apache.hadoop.io.LongWritable在mapreduce工作中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试写一份可以分析youtube数据集信息的作业.我相信我已经正确设置了驱动程序类中地图的输出键,但是仍然出现上述错误,我正在发布代码还有一个例外,

I am trying to write a job which can analyse some information from youtube data set.I believe i have correctly set the output keys from the map in the driver class,but still i am getting the above error i am posting the code and the exception here,

The Mapper

The Mapper

public class YouTubeDataMapper extends Mapper<LongWritable,Text,Text,IntWritable>{

private static final IntWritable one = new IntWritable(1); 
private Text category = new Text(); 
public void mapper(LongWritable key,Text value,Context context) throws IOException, InterruptedException{
    String str[] = value.toString().split("\t");
    category.set(str[3]);
    context.write(category, one);
}

}

Reducer类

public class YouTubeDataReducer extends Reducer<Text,IntWritable,Text,IntWritable>{

public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException, InterruptedException{
    int sum=0;
    for(IntWritable count:values){
        sum+=count.get();
    }
    context.write(key, new IntWritable(sum));
}

}

驱动程序类

public class YouTubeDataDriver {

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();

    @SuppressWarnings("deprecation")
    Job job = new Job(conf, "categories");
    job.setJarByClass(YouTubeDataDriver.class);

    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);
    // job.setNumReduceTasks(0);
    job.setOutputKeyClass(Text.class);// Here i have set the output keys
    job.setOutputValueClass(IntWritable.class);

    job.setMapperClass(YouTubeDataMapper.class);
    job.setReducerClass(YouTubeDataReducer.class);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    Path out = new Path(args[1]);
    out.getFileSystem(conf).delete(out);
    job.waitForCompletion(true);

}

}

我遇到的异常

java.io.IOException:映射中键的类型不匹配:预期 org.apache.hadoop.io.Text,收到org.apache.hadoop.io.LongWritable 在 org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.collect(MapTask.java:1069) 在 org.apache.hadoop.mapred.MapTask $ NewOutputCollector.write(MapTask.java:712) 在 org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) 在 org.apache.hadoop.mapreduce.lib.map.WrappedMapper $ Context.write(WrappedMapper.java:112) 在org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)处 org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)在 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)在 org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)在 org.apache.hadoop.mapred.YarnChild $ 2.run(YarnChild.java:168)在 java.security.AccessController.doPrivileged(本机方法),位于 javax.security.auth.Subject.doAs(Subject.java:422)在 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) 在org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.LongWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1069) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

我已经在驱动程序类中设置了输出键

i have set the ouput keys in the driver class

    job.setOutputKeyClass(Text.class);// Here i have set the output keys
    job.setOutputValueClass(IntWritable.class);

但是为什么我仍然出现错误?请帮助,我是mapreduce的新手

But why i am still getting the error? PLease help, i am new to mapreduce

推荐答案

mapper()方法重命名为map()(请参见

Rename the mapper() method to map() (see official docs).

正在发生的事情是,映射器实际上没有处理任何数据.它不会输入mapper()方法(因为它正在寻找map()方法),因此地图相位保持不变,这意味着地图输出键仍为LongWritable.

What's happening is that no data is actually being processed by the mapper. It doesn't enter the mapper() method (as it's looking for a map() method), and so leaves the map phase unchanged, meaning the map output key is still LongWritable.

顺便说一句,

String str[] = value.toString().split("\t");
category.set(str[3]);

是非常危险的.假设所有输入数据将至少包含3个\t字符是有风险的.当处理大量数据时,几乎总是会有一些格式不符合您期望的格式,并且您不希望整个工作在这种情况下死掉.考虑做类似的事情:

is very dangerous. It's risky to assume that all of your input data will contain at least 3 \t characters. When processing large amounts of data there will almost always be some that's not in the format you expect, and you don't want your entire job to die when that happens. Consider doing something like:

String valueStr = value.toString();
if (valueStr != null) {
    String str[] = valueStr.split("\t");
    if (str[] != null && str.size > 3) {
        category.set(str[3]);
        context.write(category, one);
    }
}

这篇关于如何解决预期的org.apache.hadoop.io.Text,收到org.apache.hadoop.io.LongWritable在mapreduce工作中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆