Metastore 中的分区,但 HDFS 中不存在路径 [英] Partition in metastore but path doesn't exist in HDFS

查看:23
本文介绍了Metastore 中的分区,但 HDFS 中不存在路径的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们的摄取过程存在问题,这会导致将分区添加到 Hive 中的表,但 HDFS 中的路径实际上并不存在.我们已经解决了这个问题,但我们仍然有这些坏分区.使用 Tez 查询这些表时,我们得到 FileNotFound 异常,指向 HDFS 中不存在的位置.如果我们使用 MR 而不是 Tez,则查询有效(这让我很困惑),但速度太慢了.

有没有办法列出所有有这个问题的分区?MSCK REPAIR 似乎处理了相反的问题,数据存在于 HDFS 中,但 Hive 中没有分区.

更多信息.这是文件未找到异常的输出:

java.io.FileNotFoundException:文件 hdfs:///db/tables/2016/03/14/mytable 不存在.

如果我运行 show partitions <db.mytable>,我会得到所有的分区,包括 dt=2016-03-14 的一个.>

show table 扩展为 ''partition(dt='2016-03-14' 返回相同的位置:位置:hdfs://server/db/tables/2016/03/14/mytable.

解决方案

MSCK REPAIR TABLE 不提供此功能,我也面临同样的问题,我找到了解决方案,

我们知道'msck repair'命令根据目录添加分区,所以首先删除所有分区

hive>ALTER TABLE mytable drop if exists partitions(p<>'');

以上命令删除所有分区,

然后使用 msck repair 命令然后它将从表位置的目录创建分区.

hive>msck 修复表 mytable

We had an issue with our ingestion process that would result in partitions being added to a table in Hive, but the path in HDFS didn't actually exist. We've fixed that issue, but we still have these bad partitions. When querying these tables using Tez, we get FileNotFound exception, pointing to the location in HDFS that doesn't exist. If we use MR instead of Tez, the query works (which is very confusing to me), but it's too slow.

Is there a way to list all the partitions that have this probem? MSCK REPAIR seems to handle the opposite problem, where the data exists in HDFS but there is no partition in Hive.

EDIT: More info. Here's the output of the file not found exception:

java.io.FileNotFoundException: File hdfs://<server>/db/tables/2016/03/14/mytable does not exist.

If I run show partitions <db.mytable>, I'll get all the partitions, including one for dt=2016-03-14.

show table extended like '<db.mytable>' partition(dt='2016-03-14' returns the same location: location:hdfs://server/db/tables/2016/03/14/mytable.

解决方案

MSCK REPAIR TABLE <tablename> does not provide this facility and I also face this same issue and I found solution for this,

As we know 'msck repair' command add partitions based on directory, So first drop all partitions

hive>ALTER TABLE mytable drop if exists partitions(p<>'');

above command remove all partitions ,

then use msck repair command then it will create partition from directory present at table location.

hive>msck repair table mytable

这篇关于Metastore 中的分区,但 HDFS 中不存在路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆