Hadoop的(Java)的改变映射输出值的类型 [英] Hadoop (java) change the type of Mapper output values

查看:191
本文介绍了Hadoop的(Java)的改变映射输出值的类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写生成密钥一些user_ID的一个映射功能和值也文本类型。下面是我怎么做。

I am writing a mapper function that generates the keys as some user_id and the values are also Text type. Here is how I do this

public static class UserMapper extends Mapper<Object, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text userid = new Text();
    private Text catid = new Text();

    /* map method */
    public void map(Object key, Text value, Context context)
                throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString(), ","); /* separated by "," */
        int count = 0;

        userid.set(itr.nextToken());

        while (itr.hasMoreTokens()) {
            if (++count == 3) {
                catid.set(itr.nextToken());
                context.write(userid, catid);
            }else {
                itr.nextToken();
            }
        }
    }
}

然后,在主程序中,我设定输出级映射器的如下:

And then, in the main program, I set the output class of the mapper as follows:

    Job job = new Job(conf, "Customer Analyzer");
    job.setJarByClass(popularCategories.class);
    job.setMapperClass(UserMapper.class);
    job.setCombinerClass(UserReducer.class);
    job.setReducerClass(UserReducer.class);

    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Text.class);

所以,即使我已经设置了类的输出值的,要作 Text.class 后,我仍然得到以下错误,当编译:

So even though I have set the class of the output values to be of Text.class, still I get the following error when compile it:

popularCategories.java:39: write(org.apache.hadoop.io.Text,org.apache.hadoop.io.IntWritable)
 in org.apache.hadoop.mapreduce.TaskInputOutputContext<java.lang.Object,
 org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,
 org.apache.hadoop.io.IntWritable> 
 cannot be applied to (org.apache.hadoop.io.Text,org.apache.hadoop.io.Text)
 context.write(userid, catid);
                           ^

根据这个错误,它仍然是考虑映射类格式如下:写(org.apache.hadoop.io.Text,org.apache.hadoop.io.IntWritable)

所以,当我改变类的定义如下,问题就解决了​​。

So, when I change the class definition as follows, the problem is solved.

 public static class UserMapper extends Mapper<Object, Text, Text, Text> {

 }

所以,我想了解的是什么类的定义和设置映射输出vaue类之间的区别。

So, I want to understand what is the difference between the class definition and setting the mapper output vaue class.

推荐答案

从Apache文档的页面

From Apache documentation page

Class Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

java.lang.Object
org.apache.hadoop.mapreduce.Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

其中,

KEYIN = offset of the record  ( input for Mapper )
VALUEIN = value of the line in the record ( input for Mapper )
KEYOUT = Mapper output key ( Output of Mapper, input of Reducer)
VALUEOUT = Mapper output value ( Output of Mapper, input to Reducer)

您的问题已经解决您已经于您定义的修正值映射之后,从

Your problem has been solved after you have corrected the Mapper value in your definition from

public static class UserMapper extends Mapper<Object, Text, Text, IntWritable> {

public static class UserMapper extends Mapper<Object, Text, Text, Text> {

看一看相关的SE问题:

Have a look at related SE question:

<一个href=\"http://stackoverflow.com/questions/32650835/why-longwritable-key-has-not-been-used-in-mapper-class\">Why LongWritable(键)并没有在映射器类中使用?

我发现这个文章也有用理解概念清楚了。

I have found this article is also useful to understand the concepts clearly.

这篇关于Hadoop的(Java)的改变映射输出值的类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆