Hadoop的（Java）的改变映射输出值的类型 [英] Hadoop (java) change the type of Mapper output values

查看：191 发布时间：2016/5/21 14:04:39 java apache hadoop types mapreduce

本文介绍了Hadoop的（Java）的改变映射输出值的类型的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我写生成密钥一些user_ID的一个映射功能和值也文本类型。下面是我怎么做。

I am writing a mapper function that generates the keys as some user_id and the values are also Text type. Here is how I do this

public static class UserMapper extends Mapper<Object, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text userid = new Text();
    private Text catid = new Text();

    /* map method */
    public void map(Object key, Text value, Context context)
                throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString(), ","); /* separated by "," */
        int count = 0;

        userid.set(itr.nextToken());

        while (itr.hasMoreTokens()) {
            if (++count == 3) {
                catid.set(itr.nextToken());
                context.write(userid, catid);
            }else {
                itr.nextToken();
            }
        }
    }
}

然后，在主程序中，我设定输出级映射器的如下：

And then, in the main program, I set the output class of the mapper as follows:

    Job job = new Job(conf, "Customer Analyzer");
    job.setJarByClass(popularCategories.class);
    job.setMapperClass(UserMapper.class);
    job.setCombinerClass(UserReducer.class);
    job.setReducerClass(UserReducer.class);

    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Text.class);

所以，即使我已经设置了类的输出值的，要作 Text.class 后，我仍然得到以下错误，当编译：

So even though I have set the class of the output values to be of Text.class, still I get the following error when compile it:

popularCategories.java:39: write(org.apache.hadoop.io.Text,org.apache.hadoop.io.IntWritable)
 in org.apache.hadoop.mapreduce.TaskInputOutputContext<java.lang.Object,
 org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,
 org.apache.hadoop.io.IntWritable> 
 cannot be applied to (org.apache.hadoop.io.Text,org.apache.hadoop.io.Text)
 context.write(userid, catid);
                           ^

根据这个错误，它仍然是考虑映射类格式如下：写（org.apache.hadoop.io.Text，org.apache.hadoop.io.IntWritable）

所以，当我改变类的定义如下，问题就解决了。

So, when I change the class definition as follows, the problem is solved.

 public static class UserMapper extends Mapper<Object, Text, Text, Text> {

 }

所以，我想了解的是什么类的定义和设置映射输出vaue类之间的区别。

So, I want to understand what is the difference between the class definition and setting the mapper output vaue class.

推荐答案

从Apache文档的页面

From Apache documentation page

Class Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

java.lang.Object
org.apache.hadoop.mapreduce.Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

其中，

KEYIN = offset of the record  ( input for Mapper )
VALUEIN = value of the line in the record ( input for Mapper )
KEYOUT = Mapper output key ( Output of Mapper, input of Reducer)
VALUEOUT = Mapper output value ( Output of Mapper, input to Reducer)

您的问题已经解决您已经于您定义的修正值映射之后，从

Your problem has been solved after you have corrected the Mapper value in your definition from

public static class UserMapper extends Mapper<Object, Text, Text, IntWritable> {

到

public static class UserMapper extends Mapper<Object, Text, Text, Text> {

看一看相关的SE问题：

Have a look at related SE question:

<一个href=\"http://stackoverflow.com/questions/32650835/why-longwritable-key-has-not-been-used-in-mapper-class\">Why LongWritable（键）并没有在映射器类中使用？

我发现这个文章也有用理解概念清楚了。

I have found this article is also useful to understand the concepts clearly.

这篇关于Hadoop的（Java）的改变映射输出值的类型的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Hadoop的（Java）的改变映射输出值的类型 [英] Hadoop (java) change the type of Mapper output values

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Hadoop的（Java）的改变映射输出值的类型 [英] Hadoop (java) change the type of Mapper output values

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭