在Hadoop中自定义InputFormat [英] Customizing InputFormat in Hadoop
问题描述
我正在尝试从包含地理参考时间序列数据的超大型数据库中读取数据.所以我有以下格式的文件:
I am trying to read form a very big databse which consists of geo-referenced time series data. SO I have the file in the following format:
纬度,经度,值@时间1,值@时间2,....值@时间N.
latitude,longitude,value@time1,value@time2,....value@timeN.
这是整个地球的数据. 现在,对于我的工作,我需要将纬度,经度作为键并将时间序列值作为值. 据我所知hadoop有KeyValueInputFormat,但它认为第一个制表符是分隔符. 有没有一种自定义的方法.
So this is the data for the entire earth. Now for my work I need to get the latitude,longitude as the key and the time series values as the value. As far as I know hadoop has KeyValueInputFormat but it considers first tab as the delimiter. Is there a way to customize it.
为此真的需要一个解决方案.
Really need a solution for this.
谢谢 阿育斯
推荐答案
玩转
key.value.separator.in.input.line
在作业配置中.
这篇关于在Hadoop中自定义InputFormat的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!