InputFormat决策 [英] InputFormat Decision

查看:163
本文介绍了InputFormat决策的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找出哪些给出的答案最适合这个问题:


给定一个具有以下结构的文件目录:行号,
制表符,字符串:

Example



1abialkjfjkaoasdfjksdlkjhqweroij

<2> 2kadfjhuwqounahagtnbvaswslmnbfgy



3kjfteiomndscxeqalkzhtopedkfsikj



您希望将每行作为一条记录发送给Mapper。您应该使用哪个
InputFormat来完成该行:conf.setInputFormat
(____。class); ?


A。 SequenceFileAsTextInputFormat



B。 SequenceFileInputFormat



C。 KeyValueFileInputFormat



D。 BDBInputFormat


我的分析: 选项A 是我发现存在的格式,但我不确定它的正确用法,并且它适合作为答案。



选项B 是不可能的,因为SequenceFiles是二进制数据(K,V)对二进制数据对的文件,因此不适合。

Option C 是不可能的,因为没有KeyValueFileInputFormat,虽然在这里,如果它是一个错字,并且它实际上是KeyValuetextInputFormat,比我认为它会是一个不错的选择。或者是不是?

选项D 是不可能的,因为没有BDBInputFormat,即使它是一个错字,它实际上是BDInputFormat比它不适合这种情况。



谢谢!
D

解决方案

答案是选项C 。它可能是一个错字



KeyValueTextInputFormat 可帮助您使用TAB分隔线。
因此行号将是关键字,字符串将是该值。


I am trying to figure out which of the given answers suits best the question:

Given a directory of files with the following structure: line number, tab character, string:

Example:

1abialkjfjkaoasdfjksdlkjhqweroij

2kadfjhuwqounahagtnbvaswslmnbfgy

3kjfteiomndscxeqalkzhtopedkfsikj

You want to send each line as one record to your Mapper. Which InputFormat should you use to complete the line: conf.setInputFormat (____.class) ; ?

A. SequenceFileAsTextInputFormat

B. SequenceFileInputFormat

C. KeyValueFileInputFormat

D. BDBInputFormat

My analysis:

Option A is a format I found to exist, but I'm not sure of the correct usage of it and if it suits as an answer.

Option B is not possible since SequenceFiles are file of binary data (K,V) pairs of binary data, and thus will not be suitable..

Option C is not possible because there is no KeyValueFileInputFormat, though here, if it is a typo and it actually is KeyValuetextInputFormat, than I think it will be a good choice. Or isn't it?

Option D is not possible because there is no BDBInputFormat and even if it is a typo and it actually is BDInputFormat than it wouldn't suit the case.

Thank You! D

解决方案

The answer is Option C. It may be a typo

KeyValueTextInputFormat helps you to get line splitted with TAB. So line number will be the key and the string will be the value.

这篇关于InputFormat决策的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆