Hive未检测到时间戳记格式 [英] Hive not detecting timestamp format
问题描述
我有一个PIG脚本
- 加载和转换csv中的数据
- 替换一些字符
调用java程序(JAR)将csv中的日期时间从06/02/2015 18:52至2015-6-2 18:52(mm / DD / yyyy至yyyy-MM-dd)
注册/ home / Cloudera的/ DateTime.jar;
A =使用PigStorage(',')加载'/user/cloudera/Data.csv'(ac,datetime,amt,trace);
B =生成ac,REPLACE(datetime,'\\ /',' - ')为newdate,REPLACE(amt,' - ','')为newamt,trace;
C = FOREACH B GENERATE ac,Converter.DateTime(newdate)作为ConvDate,ConvAmt,trace;
使用PigStorage('\t')将C存储到'/ user / cloudera / Output /'中;
样本输入 - 21467245 06/02/2015 18:52 -9.59 518
示例输出 - 21467245 2015-6-2 18:52 9.59 518
我将输出加载到配置单元中,其他字段在导入过程中似乎很好,但是如果将日期时间字段作为时间戳进行加载,则结果为null,并且在其字符串时为空。
在哪里出错?
使用Cloudera CDH 5
From 配置单元文档:
文本文件中的时间戳必须使用格式yyyy-mm-dd
hh:mm:ss [.f ...]。如果他们以另一种格式声明它们为
适当类型(INT,FLOAT,STRING等),并使用UDF将
转换为时间戳。
blockquote>
所以你需要改变你的
Converter
来输出这种格式,或者使用UDF ---或者只保留它们作为字符串,这是我通常做的!I have a PIG script that
- Loads and transforms the data from a csv
- Replaces some characters
Calls a java program (JAR) to convert the date-time in csv from 06/02/2015 18:52 to 2015-6-2 18:52 (mm/DD/yyyy to yyyy-MM-dd)
REGISTER /home/cloudera/DateTime.jar; A = Load '/user/cloudera/Data.csv' using PigStorage(',') as (ac,datetime,amt,trace); B = FOREACH A GENERATE ac, REPLACE(datetime, '\\/','-') as newdate,REPLACE(amt,'-','') as newamt,trace; C = FOREACH B GENERATE ac,Converter.DateTime(newdate) as ConvDate,ConvAmt,trace; Store C into '/user/cloudera/Output/' using PigStorage('\t');
Sample Input -- 21467245 06/02/2015 18:52 -9.59 518
Sample Output -- 21467245 2015-6-2 18:52 9.59 518
I am loading the output into hive, other fields seem fine during import, but the date-time field results null if loaded as timestamp and is intact when its string.
Where is this going wrong?
Am using Cloudera CDH 5
解决方案From the hive docs:
Timestamps in text files have to use the format yyyy-mm-dd hh:mm:ss[.f...]. If they are in another format declare them as the appropriate type (INT, FLOAT, STRING, etc.) and use a UDF to convert them to timestamps.
So you need to either change your
Converter
to output this format, or use a UDF --- or just keep them as strings, which is what I usually do !这篇关于Hive未检测到时间戳记格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!