Hive 0.13外部表动态分区自定义模式 [英] Hive 0.13 external table dynamic partitioning custom pattern

查看:771
本文介绍了Hive 0.13外部表动态分区自定义模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据文档,您应该能够为分区指定一个自定义模式
Hive外部表分区。但是,我无法实现它:
select * from rawlog_test7 limit 10; 不会返回记录。



这就是我在做的事情
$ b $ pre $ set hcat.dynamic.partitioning.custom.pattern =$ {year} / $ {month} / $ {day} / $ {hour}


...

pre $ 按(年份int,月份int,日期int,小时份int)划分

location'/history.eu1/ed_reports/hourly/';

和我的目录结构是 ../ 2014/06/18/13 / ...



如果我使用静态分区

  alter table rawlog_test7 add partition(year = 2014,month = 6,day = 18,hour = 13)location'/history.eu1/ed_reports/hourly/2014/06/18/13'; 

有效( select * from rawlog_test7 limit 10;

解决方案

也许我可以清楚一些Hive分区的工作原理:



分区有两个组件:文件系统上的目录和Hive的Metastore中的条目。这个条目基本上就是这个对(分区值,分区位置)。

创建一个Hive表时,它在Metastore中没有分区条目。



当您查询Hive时,它会检查Metastore需要查询的分区,然后扫描这些分区。



Hive确实不会自动检测文件系统上的分区以添加元存储条目。


$ b 动态分区是指Hive根据数据列在文件系统和元存储中创建分区的能力<当插入到分区Hive表中时,即在表rawlog_test7分区(y,m,d,h)...
命令中插入一个实际的插入。



如果您的文件系统中还没有存储过程中的目录,您可以像以前一样逐个添加它们:

  alter table rawlog_test7添加分区(year = 2014,month = 6,day = 18,hour = 13)location'/history.eu1/ed_repo RTS /每小时/ 2014/06/18 / 13’ ; 

或者您可以运行一个表格修复:

  msck修复表rawlog_test7; 

虽然我没有用自定义分区模式测试后者。


According to the documentation, you should be able to specify a custom pattern for a partition Hive external tables partitions. However, I can't get it to work: select * from rawlog_test7 limit 10; returns no records.

This is what I am doing

set hcat.dynamic.partitioning.custom.pattern="${year}/${month}/${day}/${hour}"

I create my table with ...

partitioned by (year int, month int, day int, hour int)

location '/history.eu1/ed_reports/hourly/';

and my directory structure is ../2014/06/18/13/ ...

If I use static partitions

   alter table rawlog_test7 add partition (year=2014,month=6,day=18,hour=13) location '/history.eu1/ed_reports/hourly/2014/06/18/13';

it works (select * from rawlog_test7 limit 10; returns records!)

解决方案

Maybe I can clear some things up about how Hive partitions work:

There are two components to a partition: its directory on the filesystem, and an entry in Hive's metastore. This entry is essentially just the pair (partition values, partition location).

When you create a Hive table, it has no partition entries in the metastore.

When you query Hive, it checks the metastore for partitions it would need to query, and then scans those.

Hive does not automatically detect partitions on the filesystem to add metastore entries.

"Dynamic partitioning" refers to Hive's ability to create partitions both in the filesystem and metastore based on a data column when inserting into a partitioned Hive table, i.e. doing an actual insert into table rawlog_test7 partition(y,m,d,h) ... command.

If you have directories in the filesystem that do not yet have metastore entries, you can either add them one by one, as you have been doing:

alter table rawlog_test7 add partition (year=2014,month=6,day=18,hour=13) location '/history.eu1/ed_reports/hourly/2014/06/18/13';

Or you can run a table repair:

msck repair table rawlog_test7;

Although I have not tested the latter with a custom partitioning pattern.

这篇关于Hive 0.13外部表动态分区自定义模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆