在Hadoop中自定义InputFormat [英] Customizing InputFormat in Hadoop

查看:249
本文介绍了在Hadoop中自定义InputFormat的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从包含地理参考时间序列数据的超大型数据库中读取数据.所以我有以下格式的文件:

I am trying to read form a very big databse which consists of geo-referenced time series data. SO I have the file in the following format:

纬度,经度,值@时间1,值@时间2,....值@时间N.

latitude,longitude,value@time1,value@time2,....value@timeN.

这是整个地球的数据. 现在,对于我的工作,我需要将纬度,经度作为键并将时间序列值作为值. 据我所知hadoop有KeyValueInputFormat,但它认为第一个制表符是分隔符. 有没有一种自定义的方法.

So this is the data for the entire earth. Now for my work I need to get the latitude,longitude as the key and the time series values as the value. As far as I know hadoop has KeyValueInputFormat but it considers first tab as the delimiter. Is there a way to customize it.

为此真的需要一个解决方案.

Really need a solution for this.

谢谢 阿育斯

推荐答案

玩转

key.value.separator.in.input.line

在作业配置中.

这篇关于在Hadoop中自定义InputFormat的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆