如何在特定日期使用 hive 添加分区? [英] How to add partition using hive by a specific date?
问题描述
我正在使用 hive(带有外部表)来处理存储在 amazon S3 上的数据.
I'm using hive (with external tables) to process data stored on amazon S3.
我的数据分区如下:
DIR s3://test.com/2014-03-01/
DIR s3://test.com/2014-03-02/
DIR s3://test.com/2014-03-03/
DIR s3://test.com/2014-03-04/
DIR s3://test.com/2014-03-05/
s3://test.com/2014-03-05/ip-foo-request-2014-03-05_04-20_00-49.log
s3://test.com/2014-03-05/ip-foo-request-2014-03-05_06-26_19-56.log
s3://test.com/2014-03-05/ip-foo-request-2014-03-05_15-20_12-53.log
s3://test.com/2014-03-05/ip-foo-request-2014-03-05_22-54_27-19.log
如何使用hive创建分区表?
How to create a partition table using hive?
CREATE EXTERNAL TABLE test (
foo string,
time string,
bar string
) PARTITIONED BY (? string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ' '
LOCATION 's3://test.com/';
有人可以回答这个问题吗?谢谢!
Could somebody answer this question ? Thanks!
推荐答案
首先从正确的表定义开始.在你的情况下,我只会使用你写的:
First start with the right table definition. In your case I'll just use what you wrote:
CREATE EXTERNAL TABLE test (
foo string,
time string,
bar string
) PARTITIONED BY (dt string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ' '
LOCATION 's3://test.com/';
默认情况下,Hive 期望分区位于通过约定 s3://test.com/partitionkey=partitionvalue 命名的子目录中.例如
Hive by default expects partitions to be in subdirectories named via the convention s3://test.com/partitionkey=partitionvalue. For example
s3://test.com/dt=2014-03-05
如果您遵循此约定,您可以使用 MSCK 添加所有分区.
If you follow this convention you can use MSCK to add all partitions.
如果您不能或不想使用此命名约定,则需要添加所有分区,如下所示:
If you can't or don't want to use this naming convention, you will need to add all partitions as in:
ALTER TABLE test
ADD PARTITION (dt='2014-03-05')
location 's3://test.com/2014-03-05'
这篇关于如何在特定日期使用 hive 添加分区?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!