为什么在 mapreduce 作业中需要 setMapOutputKeyClass 方法 [英] Why setMapOutputKeyClass method is necessary in mapreduce job

查看:23
本文介绍了为什么在 mapreduce 作业中需要 setMapOutputKeyClass 方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在写mapreduce程序的时候,经常会这样写代码

When I write the mapreduce program, I often write the code like

 job1.setMapOutputKeyClass(Text.class); 

但是我们为什么要明确指定 MapOutputKeyClass 呢?我们已经在映射器类中指定了它,例如

But why should we specify the MapOutputKeyClass explicitly? We have already spicify it in the mapper class such as

public static class MyMapper extends
        Mapper<LongWritable, Text, Text, Text>

在Hadoop:权威指南一书中,有一张表显示setMapOutputKeyClass方法是可选的(用于配置类型的属性),但我测试后发现是必须的,否则eclipse的Console会显示

In the book Hadoop:The definitive Guide, there is a table shows that the method setMapOutputKeyClass is optional(Properties for configuring types), but as I test, I found it is necessary, or the Console of eclipse will show

Type mismatch in key from map: expected org.apache.hadoop.io.LongWritable, received org.apache.hadoop.io.Text

谁能告诉我这是什么原因?

Can someone tell me the reason of it?

书上说

必须与 MapReduce 类型兼容的设置列在表 8-1 的下部".这是否意味着我们必须设置较低部分的属性类型,而不必设置较高部分的属性类型?

"The settings that have to be compatible with the MapReduce types are listed in the lower part of Table 8-1". Does it mean we have to set the lower part property type, but do not have to set the higher part ones?

表格的内容如下所示:

Properties for configuring types:
mapreduce.job.inputformat.class  
mapreduce.map.output.key.class  
mapreduce.map.output.value.class  
mapreduce.job.output.key.class  
mapreduce.job.output.value.class 

Properties that must be consistent with the types:
mapreduce.job.map.class   
mapreduce.job.combine.class  
mapreduce.job.partitioner.class  
mapreduce.job.output.key.comparator.class 
mapreduce.job.output.group.comparator.class  
mapreduce.job.reduce.class  
mapreduce.job.outputformat.class

推荐答案

setMapOutputKeyClass() 和 setMapOutputValueClass() 是可选的,只要它们分别与 setOutputKeyClass() 和 setOutputValueClass() 指定的作业输出类型匹配即可.换句话说,如果您的 mapper 输出与 reducer 输出不匹配,您必须使用其中一种或两种方法.

setMapOutputKeyClass() as well as setMapOutputValueClass() are optional as long as they match your job's output types specified by setOutputKeyClass() and setOutputValueClass() respectively. In other words, if your mapper output does not match your reducer output you have to use one or both of these methods.

至于您关于泛型参数的问题,由于 Java 类型擦除(Java 泛型 - 类型擦除 - 何时以及发生什么),Hadoop 在运行时不知道它们,即使编译器知道它们.

As for your question regarding generic arguments, due to Java type erasure (Java generics - type erasure - when and what happens), Hadoop does not know them at runtime, even though they are known to the compiler.

这篇关于为什么在 mapreduce 作业中需要 setMapOutputKeyClass 方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆