job.setOutputKeyClass和job.setOutputReduceClass指的是哪里? [英] Where does job.setOutputKeyClass and job.setOutputReduceClass refers to?

查看:146
本文介绍了job.setOutputKeyClass和job.setOutputReduceClass指的是哪里?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我认为他们引用了Reducer,但在我的程序中我有

public static class MyMapper extends
Mapper< LongWritable,Text,Text,Text>





public static class MyReducer extends
Reducer< Text,NullWritable,Text>



所以如果我有

job.setOutputKeyClass(NullWritable.class);



job.setOutputValueClass(Text.class );



我得到以下异常

键入来自map的键不匹配:expected org.apache.hadoop.io.NullWritable,recieved org.apache.hadoop.io.Text



但如果我有

job.setOutputKeyClass(Text.class);



没有问题。



有没有我的代码出错,或者是因为NullWritable或者其他?



我也必须使用 job.setInputFormatClass job.setOutputFormatClass ?因为我的程序没有它们就能正确运行。

/ code>将设置预期的类型作为map和reduce阶段的输出。



如果您的Mapper发出与Reducer不同的类型,您可以设置(code> JobConf 的 setMapOutputKeyClass() setMapOutputValueClass()方法。这些隐式设置了Reducer预期的输入类型。



(来源: InputFormat 是 code>的TextInputFormat 。这将每个输入文件的每一行视为单独的记录,并且不执行解析。如果您需要以不同的格式处理您的输入,您可以调用这些方法,以下是一些示例:

  InputFormat |说明| Key |价值
--------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------
TextInputFormat |默认格式;读取文本文件的行|行|的字节偏移量行内容
KeyValueInputFormat |将行分析为键,val对|一切都取决于第一个制表符|该行的其余部分
SequenceFileInputFormat | Hadoop特定的高性能二进制格式|用户自定义|用户定义的

OutputFormat 的默认实例是 TextOutputFormat ,它在文本文件的各行上写入(键,值)对。下面是一些例子:

  OutputFormat |说明
--------------------------------------------- -------------------------------------------------- ----------
TextOutputFormat |默认;以
SequenceFileOutputFormat |形式在key \t value中写入行将适合阅读的二进制文件写入后续的MapReduce作业
NullOutputFormat |无视其输入

(来源:其他雅虎开发者教程

I thought that they refer to the Reducer but in my program I have

public static class MyMapper extends Mapper< LongWritable, Text, Text, Text >

and

public static class MyReducer extends Reducer< Text, Text, NullWritable, Text >

so if I have

job.setOutputKeyClass( NullWritable.class );

job.setOutputValueClass( Text.class );

I get the following Exception

Type mismatch in key from map: expected org.apache.hadoop.io.NullWritable, recieved org.apache.hadoop.io.Text

but if I have

job.setOutputKeyClass( Text.class );

there is no problem.

Is there sth wrong with my code or this happens because of NullWritable or sth else?

Also do I have to use job.setInputFormatClass and job.setOutputFormatClass? Because my programs runs correctly without them.

解决方案

Calling job.setOutputKeyClass( NullWritable.class ); will set the types expected as output from both the map and reduce phases.

If your Mapper emits different types than the Reducer, you can set the types emitted by the mapper with the JobConf's setMapOutputKeyClass() and setMapOutputValueClass() methods. These implicitly set the input types expected by the Reducer.

(source: Yahoo Developer Tutorial)

Regarding your second question, the default InputFormat is the TextInputFormat. This treats each line of each input file as a separate record, and performs no parsing. You can call these methods if you need to process your input in a different format, here are some examples:

InputFormat             | Description                                      | Key                                      | Value
--------------------------------------------------------------------------------------------------------------------------------------------------------
TextInputFormat         | Default format; reads lines of text files        | The byte offset of the line              | The line contents
KeyValueInputFormat     | Parses lines into key, val pairs                 | Everything up to the first tab character | The remainder of the line
SequenceFileInputFormat | A Hadoop-specific high-performance binary format | user-defined                             | user-defined

The default instance of OutputFormat is TextOutputFormat, which writes (key, value) pairs on individual lines of a text file. Some examples below:

OutputFormat             | Description
---------------------------------------------------------------------------------------------------------
TextOutputFormat         | Default; writes lines in "key \t value" form
SequenceFileOutputFormat | Writes binary files suitable for reading into subsequent MapReduce jobs
NullOutputFormat         | Disregards its inputs

(source: Other Yahoo Developer Tutorial)

这篇关于job.setOutputKeyClass和job.setOutputReduceClass指的是哪里?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆