Hadoop中KeyValueTextInputFormat和TextInputFormat的主要区别是什么? [英] What are the main differences between KeyValueTextInputFormat and TextInputFormat in hadoop?

查看:1173
本文介绍了Hadoop中KeyValueTextInputFormat和TextInputFormat的主要区别是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以给我一个实际的场景,我们必须使用 KeyValueTextInputFormat TextInputFormat ??

解决方案

TextInputFormat 类将源文件的每一行转换为键/值类型BytesWritable键表示记录的偏移量,Text值表示整个记录本身。


$ b

KeyValueTextInputFormat 是一个扩展版本的TextInputFormat,当我们必须将每个源记录作为文本/文本对获取,其中通过用固定定界符分割记录填充记录中的键/值时,这非常有用。



考虑下面的文件内容,

  AL#阿拉巴马州
AR#阿肯色州
FL#Florida

如果配置 TextInputFormat 您可能会看到键/值对,

  0 AL#阿拉巴马州
14 AR#阿肯色州
23 FL#Florida

if KeyvalueTextInputFormat 是配置为 conf.set(mapreduce.input.keyvaluelinerecordreader.key.value.separator,#),您可能会看到结果为

  AL阿拉巴马州
AR阿肯色州
佛罗里达州


Can somebody give me one practical scenario where we have to use KeyValueTextInputFormat and TextInputFormat??

解决方案

The TextInputFormat class converts every row of the source file into key/value types where the BytesWritable key represents the offset of the record and the Text value represents the entire record itself.

The KeyValueTextInputFormat is an extended version of TextInputFormat , which is useful when we have to fetch every source record as Text/Text pair where the key/value were populated from the record by splitting the record with a fixed delimiter.

Consider the Below file contents,

AL#Alabama
AR#Arkansas
FL#Florida

If TextInputFormat is configured , you might see the key/value pairs as,

0    AL#Alabama
14   AR#Arkansas
23   FL#Florida

if KeyvalueTextInputFormat is configured with conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", "#") , you might see the results as,

AL    Alabama
AR    Arkansas
FL    Florida

这篇关于Hadoop中KeyValueTextInputFormat和TextInputFormat的主要区别是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆