Hadoop - 格式化创建表格的日期 [英] Hadoop - Formatting dates when creating tables

查看:261
本文介绍了Hadoop - 格式化创建表格的日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在创建Hive表的过程中格式化日期?



我目前正在将一些数据转储到工作中的发现环境中,并将日期存储为字符串,因为如果我将它们格式化为DATE或TIMESTAMP,则这些值为空。

以下是原始数据的样子:

  12/07/2016 05:07:28 PM 

我的理解是,Hive接受这种格式的日期

  yyyy-mm-dd hh:mm:ss 

我可以使用select语句来格式化它们:

<$ p $从mySchema.MyTable中app_dt选择id,receipt_dt,from_unixtime(unix_timestamp(receipt_dt,'MM / dd / yyyy'),'yyyy-MM-dd'),其中app_num ='123456'

如何在语句中添加

<$ p $ from_unixtime(unix_timestamp(receipt_dt,'MM / dd / yyyy'),'yyyy-MM-dd')

我如何在t中添加这个o下面是通用的CREATE EXTERNAL STATEMENT,以便我不再需要将日期存储为一个字符串,或者使用ALTER TABLE语句来更改格式?
$ b $

  CREATE EXTERNAL TABLE IF NOT EXISTS MySchema.My_New_Table 
(Field1 Format,
Field2 Format,
Field 3 Format,

.... ...


解决方案使用 MyTable 作为临时表与原始数据,并创建最终/目标表 my_new_table 与转换,即日期格式...这将是EDW类的过程...

例子:

  CREATE EXTERNAL TABLE IF NOT EXISTS MySchema.My_New_Table 
(Field1 int,
Field2 string,
Field3 date

...更多释义...
AS
select date,receipt_dt,
cast(from_unixtime(unix_timestamp(receipt_dt,'MM / dd / yyyy'),'yyyy-MM-dd')as date)作为app_dt
从MySchema.MyTable;

注意: 。你可能需要尝试编辑和尝试...但你有想法...

然后插入三角洲应该是类似的过程...

  INSERT INTO TABLE MySchema.My_New_Table 
AS
select id,receipt_dt,
cast(from_unixtime(unix_timestamp(receipt_dt ,'MM / dd / yyyy'),'yyyy-MM-dd')as as app_dt
from MySchema.MyTable where<条件>>;


How to format dates during the process of creating Hive tables?

I've currently been dumping some data into a discovery environment at work and storing dates as string, because if I format them as a DATE or TIMESTAMP the values are null.

Here's what the raw data looks like:

12/07/2016 05:07:28 PM

My understanding is that Hive accepts dates in this format

yyyy-mm-dd hh:mm:ss

I can format these using a select statement:

select id, receipt_dt, from_unixtime(unix_timestamp(receipt_dt ,'MM/dd/yyyy'), 'yyyy-MM-dd') as app_dt from MySchema.MyTable where app_num='123456'

How can I add in the statement

from_unixtime(unix_timestamp(receipt_dt ,'MM/dd/yyyy'), 'yyyy-MM-dd')

How can I add this in to the generic CREATE EXTERNAL STATEMENT below so that I no longer have to store dates as a string, or use an ALTER TABLE statement to change the formatting?

CREATE EXTERNAL TABLE IF NOT EXISTS MySchema.My_New_Table 
( Field1 Format, 
Field2 Format, 
Field 3 Format, 
) 
.......

解决方案

Use MyTable as staging table with raw data and create final/target table my_new_table with transformations i.e, date format...it will be EDW kind of process...

example:

CREATE EXTERNAL TABLE IF NOT EXISTS MySchema.My_New_Table 
( Field1 int, 
Field2 string, 
Field3 date
) 
... more definitions....
AS
select id, receipt_dt, 
cast(from_unixtime(unix_timestamp(receipt_dt ,'MM/dd/yyyy'), 'yyyy-MM-dd') as date) as app_dt 
from MySchema.MyTable ;

NOTE: This is not tested statement. You may need to try and edit and try...but you got the idea...

Then inserting delta should be similar process...

INSERT INTO TABLE MySchema.My_New_Table
AS
select id, receipt_dt, 
cast(from_unixtime(unix_timestamp(receipt_dt ,'MM/dd/yyyy'), 'yyyy-MM-dd') as date) as app_dt 
from MySchema.MyTable  where <<conditions>>;

这篇关于Hadoop - 格式化创建表格的日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆