创建外部表格配置单元,位置包含多个文件 [英] Create external table hive, location contains multiple files inside

查看:139
本文介绍了创建外部表格配置单元,位置包含多个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  CREATE EXTERNAL TABLE如果不存在日志(LGACT STRING,NTNAME STRING)
行格式DELIMITED FIELDS'\ t'
LOCATION'/ user /蜂房/仓库/原木/测试';

在'test'文件夹下我每天都在写文件。例如:

  / user / hive / warehouse / LOGS / test / 20170420 
/ user / hive / warehouse / LOGS / test / 20170421
/ user / hive / warehouse / LOGS / test / 20170422

我看不到LOGS表中有我创建的任何数据。



但是,我使用

<$ p创建表$ p> LOCATION'/ user / hive / warehouse / LOGS / test / 20170422';

我可以看到那些日子记录。



我想在我的HIVE表中看到/ test目录下的所有数据,并且/ test目录每天都会添加新文件。

选项1



为了支持子目录

  set mapred.input.dir.recursive = true; 

,如果您的Hive版本低于 2.0.0 然后也是

 设置hive.mapred.supports.subdirectories = false; 

选项2

创建一个分区表

  CREATE EXTERNAL TABLE如果不存在日志(LGACT STRING,NTNAME STRING)
分区by(dt date)
行格式界限字段终止''\ t'
LOCATION'/ user / hive / warehouse / LOGS / test';






  alter表LOGS添加如果不存在分区(dt =日期'2017-04-20')LOCATION'/ user / hive / warehouse / LOGS / test / 20170420'; 
alter table LOGS添加如果不存在分区(dt =日期'2017-04-21')LOCATION'/ user / hive / warehouse / LOGS / test / 20170421';
alter table LOGS添加如果不存在分区(dt =日期'2017-04-22')LOCATION'/ user / hive / warehouse / LOGS / test / 20170422';

如果您使用标准惯例保留目录,管理起来会更容易。 dt = 2017-04-20 而不是 20170420


CREATE EXTERNAL TABLE IF NOT EXISTS LOGS (LGACT STRING,NTNAME STRING)  
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'  
LOCATION '/user/hive/warehouse/LOGS/test';

under 'test' folder I am writing files daily. for eg:

/user/hive/warehouse/LOGS/test/20170420
/user/hive/warehouse/LOGS/test/20170421
/user/hive/warehouse/LOGS/test/20170422

I cannot see any data inside LOGS table that i have created.

But, I create the table using

LOCATION '/user/hive/warehouse/LOGS/test/20170422';

I can see that days records.

I want to see all the data under /test directory in my HIVE table, also the /test directory is populated daily with new files.

解决方案

Option 1

In order to support sub-directories

set mapred.input.dir.recursive=true;

and if you Hive version is lower than 2.0.0 then also

set hive.mapred.supports.subdirectories=false;

Option 2

Create a partitioned table

CREATE EXTERNAL TABLE IF NOT EXISTS LOGS (LGACT STRING,NTNAME STRING)  
partitioned by (dt date)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'  
LOCATION '/user/hive/warehouse/LOGS/test';


alter table LOGS add if not exists partition (dt=date '2017-04-20') LOCATION '/user/hive/warehouse/LOGS/test/20170420';
alter table LOGS add if not exists partition (dt=date '2017-04-21') LOCATION '/user/hive/warehouse/LOGS/test/20170421';
alter table LOGS add if not exists partition (dt=date '2017-04-22') LOCATION '/user/hive/warehouse/LOGS/test/20170422';

It would be easier to manage if you keep your directories using the standard convention, e.g. dt=2017-04-20 instead of 20170420

这篇关于创建外部表格配置单元,位置包含多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆