Hive未检测到时间戳记格式 [英] Hive not detecting timestamp format

查看:184
本文介绍了Hive未检测到时间戳记格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个PIG脚本


  • 加载和转换csv中的数据

  • 替换一些字符



调用java程序(JAR)将csv中的日期时间从06/02/2015 18:52至2015-6-2 18:52(mm / DD / yyyy至yyyy-MM-dd)

 注册/ home / Cloudera的/ DateTime.jar; 

A =使用PigStorage(',')加载'/user/cloudera/Data.csv'(ac,datetime,amt,trace);

B =生成ac,REPLACE(datetime,'\\ /',' - ')为newdate,REPLACE(amt,' - ','')为newamt,trace;

C = FOREACH B GENERATE ac,Converter.DateTime(newdate)作为ConvDate,ConvAmt,trace;

使用PigStorage('\t')将C存储到'/ user / cloudera / Output /'中;

样本输入 - 21467245 06/02/2015 18:52 -9.59 518



示例输出 - 21467245 2015-6-2 18:52 9.59 518

我将输出加载到配置单元中,其他字段在导入过程中似乎很好,但是如果将日期时间字段作为时间戳进行加载,则结果为null,并且在其字符串时为空。



在哪里出错?



使用Cloudera CDH 5

解决方案

From 配置单元文档:


文本文件中的时间戳必须使用格式yyyy-mm-dd
hh:mm:ss [.f ...]。如果他们以另一种格式声明它们为
适当类型(INT,FLOAT,STRING等),并使用UDF将
转换为时间戳。


blockquote>

所以你需要改变你的 Converter 来输出这种格式,或者使用UDF ---或者只保留它们作为字符串,这是我通常做的!


I have a PIG script that

  • Loads and transforms the data from a csv
  • Replaces some characters

Calls a java program (JAR) to convert the date-time in csv from 06/02/2015 18:52 to 2015-6-2 18:52 (mm/DD/yyyy to yyyy-MM-dd)

REGISTER /home/cloudera/DateTime.jar;

A = Load '/user/cloudera/Data.csv' using PigStorage(',') as (ac,datetime,amt,trace);

B = FOREACH A GENERATE ac, REPLACE(datetime, '\\/','-') as newdate,REPLACE(amt,'-','') as newamt,trace;

C = FOREACH B GENERATE ac,Converter.DateTime(newdate) as ConvDate,ConvAmt,trace;

Store C into '/user/cloudera/Output/' using PigStorage('\t');

Sample Input -- 21467245 06/02/2015 18:52 -9.59 518

Sample Output -- 21467245 2015-6-2 18:52 9.59 518

I am loading the output into hive, other fields seem fine during import, but the date-time field results null if loaded as timestamp and is intact when its string.

Where is this going wrong?

Am using Cloudera CDH 5

解决方案

From the hive docs:

Timestamps in text files have to use the format yyyy-mm-dd hh:mm:ss[.f...]. If they are in another format declare them as the appropriate type (INT, FLOAT, STRING, etc.) and use a UDF to convert them to timestamps.

So you need to either change your Converter to output this format, or use a UDF --- or just keep them as strings, which is what I usually do !

这篇关于Hive未检测到时间戳记格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆