对象类型的关键在hadoop映射器中 [英] Key of object type in the hadoop mapper

查看:120
本文介绍了对象类型的关键在hadoop映射器中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

hadoop的新手,并试图从这里



来自文档的映射器是 -

  Mapper< KEYIN,VALUEIN ,KEYOUT,VALUEOUT> 

我看到在mapreduce字数统计范例中,地图代码如下所示:

  public void map(Object key,Text value,Context context)

问题 - 这个Object类型的关键点是什么?如果映射器的输入是文本文档,我假设其中的值将是hadoop已经分区并存储在HDFS中的文本块(64MB或128MB)。 更一般地说,这个输入键Keyin在地图代码中的用途是什么?



任何指针都将不胜感激

解决方案

InputFormat描述了Map-Reduce作业的输入规范。默认情况下,hadoop使用 TextInputFormat ,它继承 FileInputFormat ,以处理输入文件。



我们也可以指定要使用的输入格式在客户端或驱动程序代码中:

  job.setInputFormatClass(SomeInputFormat.class); 

对于 TextInputFormat ,文件被分成线。键是文件中的位置,值是文本行。


$ b $

public void map(Object key,Text value,Context上下文) 是行偏移量, 是实际文本。



请查看TextInputFormat API https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.html p>

默认情况下,键是 LongWritable 类型,值的类型是 Text TextInputFormat 。在你的例子中,对象类型是在 LongWritable 的地方指定的,因为它是兼容的。您还可以使用 LongWritable 类型替换 Object


New to hadoop and trying to understand the mapreduce wordcount example code from here.

The mapper from documentation is -

Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

I see that in the mapreduce word count example the map code is as follows

public void map(Object key, Text value, Context context)

Question - What is the point of this key of type Object? If the input to a mapper is a text document I am assuming the value in would be the chunk of text (64MB or 128MB) that hadoop has partitioned and stored in HDFS. More generally, what is the use of this input key Keyin to the map code?

Any pointers would be greatly appreciated

解决方案

InputFormat describes the input-specification for a Map-Reduce job.By default, hadoop uses TextInputFormat, which inherits FileInputFormat, to process the input files.

We can also specify the input format to use in the client or driver code:

job.setInputFormatClass(SomeInputFormat.class);

For the TextInputFormat, files are broken into lines. Keys are the position in the file, and values are the line of text.

In the public void map(Object key, Text value, Context context) , key is the line offset and value is the actual text.

Please look at TextInputFormat API https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.html

By default, Key is LongWritable type and value is of type Text for the TextInputFormat.In your example, Object type is specified in the place of LongWritable as it is compatible. You can also use LongWritable type in the place of Object

这篇关于对象类型的关键在hadoop映射器中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆