Hadoop中KeyValueTextInputFormat和TextInputFormat的主要区别是什么? [英] What are the main differences between KeyValueTextInputFormat and TextInputFormat in hadoop?
问题描述
有人可以给我一个实际的场景,我们必须使用 KeyValueTextInputFormat
和 TextInputFormat
??
TextInputFormat
类将源文件的每一行转换为键/值类型BytesWritable键表示记录的偏移量,Text值表示整个记录本身。
$ b
KeyValueTextInputFormat
是一个扩展版本的TextInputFormat,当我们必须将每个源记录作为文本/文本对获取,其中通过用固定定界符分割记录填充记录中的键/值时,这非常有用。
考虑下面的文件内容,
AL#阿拉巴马州
AR#阿肯色州
FL#Florida
如果配置 TextInputFormat
您可能会看到键/值对,
0 AL#阿拉巴马州
14 AR#阿肯色州
23 FL#Florida
if KeyvalueTextInputFormat
是配置为 conf.set(mapreduce.input.keyvaluelinerecordreader.key.value.separator,#)
,您可能会看到结果为
AL阿拉巴马州
AR阿肯色州
佛罗里达州
Can somebody give me one practical scenario where we have to use KeyValueTextInputFormat
and TextInputFormat
??
The TextInputFormat
class converts every row of the source file into key/value types where the BytesWritable key represents the offset of the record and the Text value represents the entire record itself.
The KeyValueTextInputFormat
is an extended version of TextInputFormat , which is useful when we have to fetch every source record as Text/Text pair where the key/value were populated from the record by splitting the record with a fixed delimiter.
Consider the Below file contents,
AL#Alabama
AR#Arkansas
FL#Florida
If TextInputFormat
is configured , you might see the key/value pairs as,
0 AL#Alabama
14 AR#Arkansas
23 FL#Florida
if KeyvalueTextInputFormat
is configured with conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", "#")
, you might see the results as,
AL Alabama
AR Arkansas
FL Florida
这篇关于Hadoop中KeyValueTextInputFormat和TextInputFormat的主要区别是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!