为什么在mapreduce作业中需要setMapOutputKeyClass方法 [英] Why setMapOutputKeyClass method is necessary in mapreduce job

查看:516
本文介绍了为什么在mapreduce作业中需要setMapOutputKeyClass方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在编写mapreduce程序时,我经常会编写类似的代码

When I write the mapreduce program, I often write the code like

 job1.setMapOutputKeyClass(Text.class); 

但是为什么我们要显式指定MapOutputKeyClass?我们已经在

But why should we specify the MapOutputKeyClass explicitly? We have already spicify it in the mapper class such as

public static class MyMapper extends
        Mapper<LongWritable, Text, Text, Text>

在Hadoop:权威指南一书中,有一张表显示setMapOutputKeyClass方法是可选的(用于配置类型的属性),但是正如我测试的那样,我发现它是必需的,否则Eclipse的控制台将显示

In the book Hadoop:The definitive Guide, there is a table shows that the method setMapOutputKeyClass is optional(Properties for configuring types), but as I test, I found it is necessary, or the Console of eclipse will show

Type mismatch in key from map: expected org.apache.hadoop.io.LongWritable, received org.apache.hadoop.io.Text

有人可以告诉我原因吗?

Can someone tell me the reason of it?

书中写着

表8-1的下部列出了必须与MapReduce类型兼容的设置". 这是否意味着我们必须设置较低部分的属性类型,而不必设置较高部分的属性类型?

"The settings that have to be compatible with the MapReduce types are listed in the lower part of Table 8-1". Does it mean we have to set the lower part property type, but do not have to set the higher part ones?

表的内容如下:

Properties for configuring types:
mapreduce.job.inputformat.class  
mapreduce.map.output.key.class  
mapreduce.map.output.value.class  
mapreduce.job.output.key.class  
mapreduce.job.output.value.class 

Properties that must be consistent with the types:
mapreduce.job.map.class   
mapreduce.job.combine.class  
mapreduce.job.partitioner.class  
mapreduce.job.output.key.comparator.class 
mapreduce.job.output.group.comparator.class  
mapreduce.job.reduce.class  
mapreduce.job.outputformat.class

推荐答案

setMapOutputKeyClass()和setMapOutputValueClass()是可选的,只要它们与分别由setOutputKeyClass()和setOutputValueClass()指定的作业的输出类型匹配即可.换句话说,如果您的mapper输出与reducer输出不匹配,则必须使用这两种方法中的一种或两种.

setMapOutputKeyClass() as well as setMapOutputValueClass() are optional as long as they match your job's output types specified by setOutputKeyClass() and setOutputValueClass() respectively. In other words, if your mapper output does not match your reducer output you have to use one or both of these methods.

关于您有关泛型参数的问题,由于Java类型擦除( Java泛型-类型擦除-何时以及发生什么情况),即使编译器知道它们,Hadoop在运行时也不知道它们.

As for your question regarding generic arguments, due to Java type erasure (Java generics - type erasure - when and what happens), Hadoop does not know them at runtime, even though they are known to the compiler.

这篇关于为什么在mapreduce作业中需要setMapOutputKeyClass方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆