Hive 未检测时间戳格式 [英] Hive not detecting timestamp format
问题描述
我有一个 PIG 脚本
I have a PIG script that
- 从 csv 加载和转换数据
- 替换一些字符
调用java程序(JAR)将csv中的日期时间从06/02/2015 18:52转换为2015-6-2 18:52 (mm/DD/yyyy to yyyy-MM-dd)
Calls a java program (JAR) to convert the date-time in csv from 06/02/2015 18:52 to 2015-6-2 18:52 (mm/DD/yyyy to yyyy-MM-dd)
REGISTER /home/cloudera/DateTime.jar;
A = Load '/user/cloudera/Data.csv' using PigStorage(',') as (ac,datetime,amt,trace);
B = FOREACH A GENERATE ac, REPLACE(datetime, '\\/','-') as newdate,REPLACE(amt,'-','') as newamt,trace;
C = FOREACH B GENERATE ac,Converter.DateTime(newdate) as ConvDate,ConvAmt,trace;
Store C into '/user/cloudera/Output/' using PigStorage('\t');
样本输入 -- 21467245 06/02/2015 18:52 -9.59 518
Sample Input -- 21467245 06/02/2015 18:52 -9.59 518
样本输出 -- 21467245 2015-6-2 18:52 9.59 518
Sample Output -- 21467245 2015-6-2 18:52 9.59 518
我正在将输出加载到 hive 中,其他字段在导入过程中似乎没有问题,但是如果作为时间戳加载并且其字符串完好无损,则日期时间字段结果为 null.
I am loading the output into hive, other fields seem fine during import, but the date-time field results null if loaded as timestamp and is intact when its string.
这是哪里出了问题?
我正在使用 Cloudera CDH 5
Am using Cloudera CDH 5
推荐答案
来自 蜂巢文档:
文本文件中的时间戳必须使用 yyyy-mm-dd 格式时:分:秒[.f...].如果它们是另一种格式,则将它们声明为适当的类型(INT、FLOAT、STRING 等)并使用 UDF 进行转换他们到时间戳.
Timestamps in text files have to use the format yyyy-mm-dd hh:mm:ss[.f...]. If they are in another format declare them as the appropriate type (INT, FLOAT, STRING, etc.) and use a UDF to convert them to timestamps.
因此您需要更改您的 Converter
以输出这种格式,或者使用 UDF --- 或者只是将它们保留为字符串,这就是我通常所做的!
So you need to either change your Converter
to output this format, or use a UDF --- or just keep them as strings, which is what I usually do !
这篇关于Hive 未检测时间戳格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!