在Metastore中进行分区但HDFS中不存在路径 [英] Partition in metastore but path doesn't exist in HDFS

查看:1104
本文介绍了在Metastore中进行分区但HDFS中不存在路径的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们的导入过程存在问题,导致分区被添加到Hive中的表中,但HDFS中的路径实际上并不存在。我们已经解决了这个问题,但是我们仍然有这些不好的分区。当使用Tez查询这些表时,我们得到FileNotFound异常,指向HDFS中不存在的位置。如果我们使用MR而不是Tez,查询就可以工作(这对我来说很困惑),但它太慢了。



有没有办法列出所有分区有这个探头吗? MSCK REPAIR 似乎可以解决相反的问题,即数据存在于HDFS中,但Hive中没有分区。

编辑:更多信息。
以下是文件未找到异常的输出:

java.io.FileNotFoundException:文件hdfs://< server> / db / tables / 2016/03/14 / mytable不存在



如果我运行 show partitions< ; db.mytable> ,我将得到所有分区,包括一个用于 dt = 2016-03-14



show table扩展如'< db.mytable>'分区(dt ='2016-03-14'返回相同的位置:
位置:hdfs:// server / db / tables / 2016/03/14 / mytable

> MSCK REPAIR TABLE< tablename> 不提供此功能,我也面临同样的问题,我找到了解决方案这个,

因为我们知道'msck repair'命令会根据目录添加分区,所以先删除所有分区

  hive> ALTER TABLE mytable drop if exists partitions(p<>); 

上面的命令删除所有分区,然后使用 msck repair 命令删除所有分区,然后它会从目录中创建分区表格位置。

  hive> msck修复表mytable 


We had an issue with our ingestion process that would result in partitions being added to a table in Hive, but the path in HDFS didn't actually exist. We've fixed that issue, but we still have these bad partitions. When querying these tables using Tez, we get FileNotFound exception, pointing to the location in HDFS that doesn't exist. If we use MR instead of Tez, the query works (which is very confusing to me), but it's too slow.

Is there a way to list all the partitions that have this probem? MSCK REPAIR seems to handle the opposite problem, where the data exists in HDFS but there is no partition in Hive.

EDIT: More info. Here's the output of the file not found exception:

java.io.FileNotFoundException: File hdfs://<server>/db/tables/2016/03/14/mytable does not exist.

If I run show partitions <db.mytable>, I'll get all the partitions, including one for dt=2016-03-14.

show table extended like '<db.mytable>' partition(dt='2016-03-14' returns the same location: location:hdfs://server/db/tables/2016/03/14/mytable.

解决方案

MSCK REPAIR TABLE <tablename> does not provide this facility and I also face this same issue and I found solution for this,

As we know 'msck repair' command add partitions based on directory, So first drop all partitions

hive>ALTER TABLE mytable drop if exists partitions(p<>'');

above command remove all partitions ,

then use msck repair command then it will create partition from directory present at table location.

hive>msck repair table mytable

这篇关于在Metastore中进行分区但HDFS中不存在路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆