手动从 HDFS 中删除分区数据时,如何更新 Hive 中的分区元数据 [英] How to update partition metadata in Hive , when partition data is manualy deleted from HDFS

查看:179
本文介绍了手动从 HDFS 中删除分区数据时,如何更新 Hive 中的分区元数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

自动更新Hive分区表元数据的方法是什么?

What is the way to automatically update the metadata of Hive partitioned tables?

如果新的分区数据被添加到 HDFS(没有执行 alter table add partition 命令).然后我们可以通过执行命令msck repair"来同步元数据.

If new partition data's were added to HDFS (without alter table add partition command execution) . then we can sync up the metadata by executing the command 'msck repair'.

如果从HDFS中删除了大量分区数据(没有执行alter table drop partition命令执行)怎么办.

What to be done if a lot of partitioned data were deleted from HDFS (without the execution of alter table drop partition commad execution).

同步 Hive 元数据的方法是什么?

What is the way to syncup the Hive metatdata?

推荐答案

EDIT : 从 Hive 3.0.0 开始 MSCK 现在可以发现使用以下语法新建分区或删除丢失的分区(或两者):

EDIT : Starting with Hive 3.0.0 MSCK can now discover new partitions or remove missing partitions (or both) using the following syntax :

MSCK [REPAIR] TABLE table_name [ADD/DROP/SYNC PARTITIONS]

这是在 HIVE-17824

正如 HakkiBuyukcengiz 所述,MSCK REPAIR 不会删除分区,如果相应的HDFS 上的文件夹已被手动删除,只有在创建新文件夹时才会添加分区.

As correctly stated by HakkiBuyukcengiz, MSCK REPAIR doesn't remove partitions if the corresponding folder on HDFS was manually deleted, it only adds partitions if new folders are created.

从官方文档中提取:

换句话说,它会将任何存在于 HDFS 上但不在 Metastore 中的分区添加到 Metastore.

In other words, it will add any partitions that exist on HDFS but not in metastore to the metastore.

如果在 HDFS 上手动删除了多个分区文件夹,并且我想快速刷新分区,那么在 external 表存在时,我通常会这样做:

This is what I usually do in the presence of external tables if multiple partitions folders are manually deleted on HDFS and I want to quickly refresh the partitions :

  • 删除表 (DROP TABLE table_name)(删除外部表不会删除底层分区文件)
  • 重新创建表 (CREATE EXTERNAL TABLE table_name ...)
  • 修复它(MSCK REPAIR TABLE table_name)
  • Drop the table (DROP TABLE table_name) (dropping an external table does not delete the underlying partition files)
  • Recreate the table (CREATE EXTERNAL TABLE table_name ...)
  • Repair it (MSCK REPAIR TABLE table_name)

根据分区的数量,这可能需要很长时间.另一种解决方案是对每个已删除的分区文件夹使用 ALTER TABLE DROP PARTITION (...),但如果删除了多个分区,这可能会很乏味.

Depending on the number of partitions this can take a long time. The other solution is to use ALTER TABLE DROP PARTITION (...) for each deleted partition folder but this can be tedious if multiple partitions were deleted.

这篇关于手动从 HDFS 中删除分区数据时,如何更新 Hive 中的分区元数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆