蜂房:按整数列的一部分进行分区 [英] Hive: Partitioning by part of integer column
问题描述
我想创建一个外部Hive表,按记录类型和日期(年,月,日)划分.一种复杂的情况是,我在数据文件中使用的日期格式是单值整数yyyymmddhhmmss,而不是所需的日期格式yyyy-mm-dd hh:mm:ss.是否可以仅基于单个数据值指定3个新分区列?类似于下面的示例(无效)
I want to create an external Hive table, partitioned by record type and date (year, month, day). One complication is that the date format I have in my data files is a single value integer yyyymmddhhmmss instead of the required date format yyyy-mm-dd hh:mm:ss. Can I specify 3 new partition column based on just single data value? Something like the example below (which doesn't work)
create external table cdrs (
record_id int,
record_detail tinyint,
datetime_start int
)
partitioned by (record_type int, createyear=datetime_start(0,3) int, createmonth=datetime_start(4,5) int, createday=datetime_start(6,7) int)
row format delimited
fields terminated by '|'
lines terminated by '\n'
stored as TEXTFILE
location 'hdfs://nameservice1/tmp/sbx_unleashed.db'
tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="1");
推荐答案
如果您希望能够使用 MSCK REPAIR TABLE
根据目录结构为您添加分区,则应使用以下约定:
If you want to be able to use MSCK REPAIR TABLE
to add the partition for you based on the directories structure you should use the following convention:
- 目录的嵌套应与分区列的顺序匹配.
- 目录名称应为
{分区列名称} = {value}
如果打算手动添加分区,则该结构没有意义.
任何设置值都可以与任何目录耦合.例如-
If you intends to add the partitions manually then the structure has no meaning.
Any set values can be coupled with any directory. e.g. -
alter table cdrs
add if not exist partition (record_type='TYP123',createdate=date '2017-03-22')
location 'hdfs://nameservice1/tmp/sbx_unleashed.db/2017MAR22_OF_TYPE_123';
假定目录结构-
Assuming directory structure -
.../sbx_unleashed.db/record_type=.../createyear=.../createmonth=.../createday=.../
例如
.../sbx_unleashed.db/record_type=TYP123/createyear=2017/createmonth=03/createday=22/
create external table cdrs
(
record_id int
,record_detail tinyint
,datetime_start int
)
partitioned by (record_type int,createyear int, createmonth tinyint, createday tinyint)
row format delimited
fields terminated by '|'
lines terminated by '\n'
stored as TEXTFILE
location 'hdfs://nameservice1/tmp/sbx_unleashed.db'
tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="1")
;
假定目录结构-
.../sbx_unleashed.db/record_type=.../createdate=.../
例如
.../sbx_unleashed.db/record_type=TYP123/createdate=2017-03-22/
create external table cdrs
(
record_id int
,record_detail tinyint
,datetime_start int
)
partitioned by (record_type int,createdate date)
row format delimited
fields terminated by '|'
lines terminated by '\n'
stored as TEXTFILE
location 'hdfs://nameservice1/tmp/sbx_unleashed.db'
tblproperties ("skip.header.line.count"="1", "skip.footer.line.count"="1")
;
这篇关于蜂房:按整数列的一部分进行分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!