在PIG中存储日期和时间 [英] Storing Date and Time In PIG

查看:96
本文介绍了在PIG中存储日期和时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试存储分别具有两列日期和时间的txt文件. 像这样的东西: 1999-01-01 12:08:56

I am trying to store a txt file that has two columns date and time respectively. Something like this: 1999-01-01 12:08:56

现在我想使用PIG执行一些Date操作,但是我想像这样存储日期和时间 1999-01-01T12:08:56(我检查了此链接): http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html

Now I want to perform some Date operations using PIG, but i want to store date and time like this 1999-01-01T12:08:56 ( I checked this link): http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html

我想知道的是,我可以使用哪种格式将日期和时间放在一栏中,以便可以将其提供给PIG,然后如何将该日期加载到Pig中.我知道我们将其更改为日期时间,但显示错误.有人可以告诉我如何将日期和时间数据一起加载.一个例子会很有帮助.

What I want to know is that what kind of format can I use in which my date and time are in one column, so that I can feed it to PIG, and then how to load that date into pig. I know we change it into datetime, but its showing errors. Can somebody kindly tell me how to load Date&Time data together. An example would be of great help.

推荐答案

请告诉我这是否适合您.

Please let me know if this works for you.

input.txt  
1999-01-01 12:08:56  
1999-01-02 12:08:57  
1999-01-03 12:08:58  
1999-01-04 12:08:59  

PigScript:  
A = LOAD 'input.txt' using PigStorage(' ') as(date:chararray,time:chararray);  
B = FOREACH A GENERATE CONCAT(date,'T',time) as myDateString;  
C = FOREACH B GENERATE ToDate(myDateString);  
dump C;  

Output:  
(1999-01-01T12:08:56.000+05:30)  
(1999-01-02T12:08:57.000+05:30)  
(1999-01-03T12:08:58.000+05:30)  
(1999-01-04T12:08:59.000+05:30)  

Now the myDateString is in date object, you can process this data using all the build in date functions.

Incase if you want to store the output as in this format 
(1999-01-01T12:08:56)  
(1999-01-02T12:08:57)  
(1999-01-03T12:08:58)  
(1999-01-04T12:08:59)

you can use REGEX_EXTRACT to parse the each data till "."  something like this  

D = FOREACH C GENERATE ToString($0) as temp;
E = FOREACH D GENERATE REGEX_EXTRACT(temp, '(.*)\\.(.*)', 1);
dump E;

Output:
(1999-01-01T12:08:56)  
(1999-01-02T12:08:57)  
(1999-01-03T12:08:58)  
(1999-01-04T12:08:59)  

这篇关于在PIG中存储日期和时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆