如何在配置单元中的子目录上执行分区 [英] how to do partitions on subdirectories in hive

查看:112
本文介绍了如何在配置单元中的子目录上执行分区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 `/ hadoop / maindirec / subdirect1 / file1 
/ hadoop / maindirec / subdirect1 / file2
/ hadoop / maindirec / subdirect2 / file1
/ hadoop / maindirec / subdirect2 / file2
/ hadoop / maindirec / subdirect3 / file1
/ hadoop / maindirec / subdirect3 / file2
/ hadoop / maindirec / subdirect4 / file1
/ hadoop / maindirec / subdirect4 / file2
/ hadoop / maindirec / subdirect5 / file1
/ hadoop / maindirec / subdirect5 / file2`

现在我想要创建带有orc格式的配置表格作为maindirec和subdirect1-5作为partpart。
任何人都可以请让我知道如何做到这一点。
提前致谢



到目前为止
$ b

以'\ t'结尾
以'orc位置存储'/ hadoop创建外部表temp(名称字符串,id int)由(subd字符串)分区
行格式分隔
/ maindirec'
tblproperties(orc.compress=SNAPPY,skip.header.line.count=4);



alter table temp add分区(subd ='subdirect1')位置'/ hadoop / maindirec / subdirect1'分区(subd ='subdirect2')location'/ hadoop / maindirec / subdirect2'partition(subd ='subdirect3')location
'/ hadoop / maindirec / subdirect3'partition(subd ='subdirect4')location
'/ hadoop / maindirec / subdirect4'分区(subd ='subdirect5')位置'/ hadoop / maindirec / subdirect5';

$ b 输入

select * from temp;

输出

失败,异常java.io.IOException:java.lang.RuntimeException:严重问题


CREATE EXTERNAL TABLE temp_table(col1 int,col2 int)PARTITIONED BY(subd string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY','
LIN ES终止于'\\\
'
存放为ORC
LOCATION'/ hadoop / maindirec';



ALTER TABLE temp_table ADD PARTITION(subd = 'subdirect1')位置$ b $ / hadoop / maindirec / subdirect1 / files1-100'

PARTITION(subd ='subdirect2')LOCATION'/ hadoop / maindirec / subdirect2 / files1-100'<
PARTITION(subd ='subdirect4')LOCATION'/ hadoop / maindirec / subdirect3 / files1-100'

PARTITION(subd ='subdirect4')LOCATION'/ hadoop / maindirec / subdirect4 / files1-100'

PARTITION(subd ='subdirect5')LOCATION'/ hadoop / maindirec / subdirect5 / files1-100';



I have directory structure like below in my hadoop,

`/hadoop/maindirec/subdirect1/file1
 /hadoop/maindirec/subdirect1/file2
 /hadoop/maindirec/subdirect2/file1 
 /hadoop/maindirec/subdirect2/file2
 /hadoop/maindirec/subdirect3/file1 
 /hadoop/maindirec/subdirect3/file2
 /hadoop/maindirec/subdirect4/file1    
 /hadoop/maindirec/subdirect4/file2
 /hadoop/maindirec/subdirect5/file1
 /hadoop/maindirec/subdirect5/file2`

Now i want to create hive table with orc format as maindirec and subdirect1-5 as partiations. could anyone please let me know how it can be done. Thanks in advance.

so far

create external table temp(name string,id int) partitioned by(subd string) row format delimited fields terminated by '\t' stored as orc location '/hadoop/maindirec' tblproperties("orc.compress"="SNAPPY","skip.header.line.count"="4");
alter table temp add partition(subd='subdirect1') location '/hadoop/maindirec/subdirect1' partition(subd='subdirect2') location '/hadoop/maindirec/subdirect2' partition(subd='subdirect3') location '/hadoop/maindirec/subdirect3' partition(subd='subdirect4') location '/hadoop/maindirec/subdirect4' partition(subd='subdirect5') location '/hadoop/maindirec/subdirect5';

Input
select * from temp;
Output
Failed with exception java.io.IOException:java.lang.RuntimeException: serious problem

解决方案

You can use this code:(change and add column names as per your need)

CREATE EXTERNAL TABLE temp_table ( col1 int,col2 int) PARTITIONED BY ( subd string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS ORC LOCATION '/hadoop/maindirec';

ALTER TABLE temp_table ADD PARTITION (subd='subdirect1') LOCATION '/hadoop/maindirec/subdirect1/files1-100'
PARTITION (subd='subdirect2') LOCATION '/hadoop/maindirec/subdirect2/files1-100'
PARTITION (subd='subdirect3') LOCATION '/hadoop/maindirec/subdirect3/files1-100'
PARTITION (subd='subdirect4') LOCATION '/hadoop/maindirec/subdirect4/files1-100'
PARTITION (subd='subdirect5') LOCATION '/hadoop/maindirec/subdirect5/files1-100';

这篇关于如何在配置单元中的子目录上执行分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆