在 PIG 中存储日期和时间 [英] Storing Date and Time In PIG

查看:24
本文介绍了在 PIG 中存储日期和时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试存储一个分别具有两列日期和时间的 txt 文件.像这样的东西:1999-01-01 12:08:56

I am trying to store a txt file that has two columns date and time respectively. Something like this: 1999-01-01 12:08:56

现在我想使用 PIG 执行一些日期操作,但我想像这样存储日期和时间1999-01-01T12:08:56(我检查了这个链接):http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html

Now I want to perform some Date operations using PIG, but i want to store date and time like this 1999-01-01T12:08:56 ( I checked this link): http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html

我想知道的是,我可以使用哪种格式将日期和时间放在一列中,以便我可以将其提供给 PIG,然后如何将该日期加载到 pig 中.我知道我们将其更改为日期时间,但它显示错误.有人可以告诉我如何一起加载日期和时间数据.举个例子会有很大帮助.

What I want to know is that what kind of format can I use in which my date and time are in one column, so that I can feed it to PIG, and then how to load that date into pig. I know we change it into datetime, but its showing errors. Can somebody kindly tell me how to load Date&Time data together. An example would be of great help.

推荐答案

如果这对您有用,请告诉我.

Please let me know if this works for you.

input.txt  
1999-01-01 12:08:56  
1999-01-02 12:08:57  
1999-01-03 12:08:58  
1999-01-04 12:08:59  

PigScript:  
A = LOAD 'input.txt' using PigStorage(' ') as(date:chararray,time:chararray);  
B = FOREACH A GENERATE CONCAT(date,'T',time) as myDateString;  
C = FOREACH B GENERATE ToDate(myDateString);  
dump C;  

Output:  
(1999-01-01T12:08:56.000+05:30)  
(1999-01-02T12:08:57.000+05:30)  
(1999-01-03T12:08:58.000+05:30)  
(1999-01-04T12:08:59.000+05:30)  

Now the myDateString is in date object, you can process this data using all the build in date functions.

Incase if you want to store the output as in this format 
(1999-01-01T12:08:56)  
(1999-01-02T12:08:57)  
(1999-01-03T12:08:58)  
(1999-01-04T12:08:59)

you can use REGEX_EXTRACT to parse the each data till "."  something like this  

D = FOREACH C GENERATE ToString($0) as temp;
E = FOREACH D GENERATE REGEX_EXTRACT(temp, '(.*)\\.(.*)', 1);
dump E;

Output:
(1999-01-01T12:08:56)  
(1999-01-02T12:08:57)  
(1999-01-03T12:08:58)  
(1999-01-04T12:08:59)  

这篇关于在 PIG 中存储日期和时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆