Hadoop - 格式化创建表格的日期 [英] Hadoop - Formatting dates when creating tables
问题描述
如何在创建Hive表的过程中格式化日期?
我目前正在将一些数据转储到工作中的发现环境中,并将日期存储为字符串,因为如果我将它们格式化为DATE或TIMESTAMP,则这些值为空。
以下是原始数据的样子:
12/07/2016 05:07:28 PM
我的理解是,Hive接受这种格式的日期
yyyy-mm-dd hh:mm:ss
我可以使用select语句来格式化它们:
<$ p $从mySchema.MyTable中app_dt选择id,receipt_dt,from_unixtime(unix_timestamp(receipt_dt,'MM / dd / yyyy'),'yyyy-MM-dd'),其中app_num ='123456'
如何在语句中添加
<$ p $ from_unixtime(unix_timestamp(receipt_dt,'MM / dd / yyyy'),'yyyy-MM-dd')
我如何在t中添加这个o下面是通用的CREATE EXTERNAL STATEMENT,以便我不再需要将日期存储为一个字符串,或者使用ALTER TABLE语句来更改格式?
$ b $
CREATE EXTERNAL TABLE IF NOT EXISTS MySchema.My_New_Table
(Field1 Format,
Field2 Format,
Field 3 Format,
)
.... ...
MyTable
作为临时表与原始数据,并创建最终/目标表 my_new_table
与转换,即日期
格式...这将是EDW类的过程... 例子:
CREATE EXTERNAL TABLE IF NOT EXISTS MySchema.My_New_Table
(Field1 int,
Field2 string,
Field3 date
)
...更多释义...
AS
select date,receipt_dt,
cast(from_unixtime(unix_timestamp(receipt_dt,'MM / dd / yyyy'),'yyyy-MM-dd')as date)作为app_dt
从MySchema.MyTable;
注意: 。你可能需要尝试编辑和尝试...但你有想法...
然后插入三角洲应该是类似的过程...
INSERT INTO TABLE MySchema.My_New_Table
AS
select id,receipt_dt,
cast(from_unixtime(unix_timestamp(receipt_dt ,'MM / dd / yyyy'),'yyyy-MM-dd')as as app_dt
from MySchema.MyTable where<条件>>;
How to format dates during the process of creating Hive tables?
I've currently been dumping some data into a discovery environment at work and storing dates as string, because if I format them as a DATE or TIMESTAMP the values are null.
Here's what the raw data looks like:
12/07/2016 05:07:28 PM
My understanding is that Hive accepts dates in this format
yyyy-mm-dd hh:mm:ss
I can format these using a select statement:
select id, receipt_dt, from_unixtime(unix_timestamp(receipt_dt ,'MM/dd/yyyy'), 'yyyy-MM-dd') as app_dt from MySchema.MyTable where app_num='123456'
How can I add in the statement
from_unixtime(unix_timestamp(receipt_dt ,'MM/dd/yyyy'), 'yyyy-MM-dd')
How can I add this in to the generic CREATE EXTERNAL STATEMENT below so that I no longer have to store dates as a string, or use an ALTER TABLE statement to change the formatting?
CREATE EXTERNAL TABLE IF NOT EXISTS MySchema.My_New_Table
( Field1 Format,
Field2 Format,
Field 3 Format,
)
.......
Use MyTable
as staging table with raw data and create final/target table my_new_table
with transformations i.e, date
format...it will be EDW kind of process...
example:
CREATE EXTERNAL TABLE IF NOT EXISTS MySchema.My_New_Table
( Field1 int,
Field2 string,
Field3 date
)
... more definitions....
AS
select id, receipt_dt,
cast(from_unixtime(unix_timestamp(receipt_dt ,'MM/dd/yyyy'), 'yyyy-MM-dd') as date) as app_dt
from MySchema.MyTable ;
NOTE: This is not tested statement. You may need to try and edit and try...but you got the idea...
Then inserting delta should be similar process...
INSERT INTO TABLE MySchema.My_New_Table
AS
select id, receipt_dt,
cast(from_unixtime(unix_timestamp(receipt_dt ,'MM/dd/yyyy'), 'yyyy-MM-dd') as date) as app_dt
from MySchema.MyTable where <<conditions>>;
这篇关于Hadoop - 格式化创建表格的日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!