没有添加Apache配置单元MSCK REPAIR TABLE新分区 [英] Apache hive MSCK REPAIR TABLE new partition not added

查看:163
本文介绍了没有添加Apache配置单元MSCK REPAIR TABLE新分区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Apache Hive的新手。在处理外部表分区时,如果我将新分区直接添加到HDFS,则在运行MSCK REPAIR表后不会添加新分区。以下是我试过的代码,

- 创建外部表

 蜂房>创建由(区域字符串)分区的外部表工厂(名称字符串,empid int,年龄int)
>以','结尾的行格式定界字段;

- 详细表格资讯

<$ p $位置:hdfs://localhost.localdomain:8020 / user / hive / warehouse / factory
表类型:EXTERNAL_TABLE
表参数:
EXTERNAL TRUE
transient_lastDdlTime 1438579844

- 在HDFS中创建目录以加载表格工厂的数据

  [cloudera @ localhost〜] $ hadoop fs -mkdir'hdfs://localhost.localdomain:8020 / user / hive / testing / testing1 / factory1 '
[cloudera @ localhost〜] $ hadoop fs -mkdir'hdfs://localhost.localdomain:8020 / user / hive / testing / testing1 / factory2'

- 表格数据

  cat factory1.txt 
emp1,500,40
emp2,501,45
emp3,502,50

cat factory2.txt
EMP10,200,25
EMP11,201,27
EMP12,202,30

- 从本地复制到HDFS

  [cloudera @ localhost〜] $ hadoop fs -copyFromLocal'/ home / cloudera /factory1.txt''hdfs://localhost.localdomain:8020 / user / hive / testing / testing1 / factory1'
[cloudera @ localhost〜] $ hadoop fs -copyFromLocal'/home/cloudera/factory2.txt ''hdfs://localhost.localdomain:8020 / user / hive / testing / testing1 / factory2'

- 在Metastore中更改要更新的表

  hive> alter table factory add partition(region ='southregion')location'/ user / hive / testing / testing1 / factory2'; 
hive> alter table factory add partition(region ='northregion')location'/ user / hive / testing / testing1 / factory1';
hive>从工厂中选择*
OK
emp1 500 40 northregion
emp2 501 45 northregion
emp3 502 50 northregion
EMP10 200 25南方区域
EMP11 201 27南方区域
EMP12 202 30 southregion

现在我创建了新文件factory3.txt作为表格工厂的新分区

  cat factory3.txt 
user1,100,25
user2,101,27
user3 ,102,30

- 创建路径并复制表格数据

  [cloudera @ localhost〜] $ hadoop fs -mkdir'hdfs://localhost.localdomain:8020 / user / hive / testing / testing1 / factory2'
[cloudera @ localhost〜] $ hadoop fs -copyFromLocal'/home/cloudera/factory3.txt''hdfs://localhost.localdomain:8020 / user / hive / testing / testing1 / factory3'
code>

现在我执行了下面的查询来更新添加新分区的Metastore

  MSCK修理工厂工厂; 

现在表格并没有给出factory3文件的新分区内容。如果我运行alter命令,那么它显示新的分区数据。



>

  hive> alter table factory add partition(region ='eastregion')location'/ user / hive / testing / testing1 / factory3'; 

我能知道为什么MSCK REPAIR TABLE命令不起作用吗?
为了使MSCK正常工作,应该使用命名约定/ partition_name = partition_value /。


I am new for Apache Hive. While working on external table partition, if I add new partition directly to HDFS, the new partition is not added after running MSCK REPAIR table. Below are the codes I tried,

-- creating external table

hive> create external table factory(name string, empid int, age int) partitioned by(region string)  
    > row format delimited fields terminated by ','; 

--Detailed Table Information

Location:  hdfs://localhost.localdomain:8020/user/hive/warehouse/factory     
Table Type:             EXTERNAL_TABLE           
Table Parameters:        
    EXTERNAL                TRUE                
    transient_lastDdlTime   1438579844  

-- creating directory in HDFS to load data for table factory

[cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory1'
[cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2'

-- Table data

cat factory1.txt
emp1,500,40
emp2,501,45
emp3,502,50

cat factory2.txt
EMP10,200,25
EMP11,201,27
EMP12,202,30

-- copying from local to HDFS

[cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory1.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory1'
[cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory2.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2'

-- Altering table to update in the metastore

hive> alter table factory add partition(region='southregion') location '/user/hive/testing/testing1/factory2';
hive> alter table factory add partition(region='northregion') location '/user/hive/testing/testing1/factory1';            
hive> select * from factory;                                                                      
OK
emp1    500 40  northregion
emp2    501 45  northregion
emp3    502 50  northregion
EMP10   200 25  southregion
EMP11   201 27  southregion
EMP12   202 30  southregion

Now I created new file factory3.txt to add as new partition for the table factory

cat factory3.txt
user1,100,25
user2,101,27
user3,102,30

-- creating the path and copying table data

[cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2'
[cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory3.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory3'

now I executed the below query to update the metastore for the new partition added

MSCK REPAIR TABLE factory;

Now the table is not giving the new partition content of factory3 file. Can I know where I am doing mistake while adding partition for table factory?

whereas, if I run the alter command then it is showing the new partition data.

hive> alter table factory add partition(region='eastregion') location '/user/hive/testing/testing1/factory3';

Can I know why the MSCK REPAIR TABLE command is not working?

解决方案

For the MSCK to work, naming convention /partition_name=partition_value/ should be used.

这篇关于没有添加Apache配置单元MSCK REPAIR TABLE新分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆