创建外部表配置单元,位置内部包含多个文件 [英] Create external table hive, location contains multiple files inside

查看:20
本文介绍了创建外部表配置单元,位置内部包含多个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

CREATE EXTERNAL TABLE IF NOT EXISTS LOGS (LGACT STRING,NTNAME STRING)  
ROW FORMAT DELIMITED FIELDS TERMINATED BY '	'  
LOCATION '/user/hive/warehouse/LOGS/test';

在test"文件夹下,我每天都在写文件.例如:

under 'test' folder I am writing files daily. for eg:

/user/hive/warehouse/LOGS/test/20170420
/user/hive/warehouse/LOGS/test/20170421
/user/hive/warehouse/LOGS/test/20170422

我在创建的 LOGS 表中看不到任何数据.

I cannot see any data inside LOGS table that i have created.

但是,我使用

LOCATION '/user/hive/warehouse/LOGS/test/20170422';

我可以看到那几天的记录.

I can see that days records.

我想在我的 HIVE 表中查看/test 目录下的所有数据,而且/test 目录每天都会填充新文件.

I want to see all the data under /test directory in my HIVE table, also the /test directory is populated daily with new files.

推荐答案

选项 1

为了支持子目录

set mapred.input.dir.recursive=true;

如果您的 Hive 版本低于 2.0.0 那么也

and if you Hive version is lower than 2.0.0 then also

set hive.mapred.supports.subdirectories=false;

选项 2

创建分区表

CREATE EXTERNAL TABLE IF NOT EXISTS LOGS (LGACT STRING,NTNAME STRING)  
partitioned by (dt date)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '	'  
LOCATION '/user/hive/warehouse/LOGS/test';

<小时>

alter table LOGS add if not exists partition (dt=date '2017-04-20') LOCATION '/user/hive/warehouse/LOGS/test/20170420';
alter table LOGS add if not exists partition (dt=date '2017-04-21') LOCATION '/user/hive/warehouse/LOGS/test/20170421';
alter table LOGS add if not exists partition (dt=date '2017-04-22') LOCATION '/user/hive/warehouse/LOGS/test/20170422';

如果您使用标准约定保留目录,则管理起来会更容易,例如dt=2017-04-20 而不是 20170420

It would be easier to manage if you keep your directories using the standard convention, e.g. dt=2017-04-20 instead of 20170420

这篇关于创建外部表配置单元,位置内部包含多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆