在Metastore中进行分区但HDFS中不存在路径 [英] Partition in metastore but path doesn't exist in HDFS
问题描述
我们的导入过程存在问题,导致分区被添加到Hive中的表中,但HDFS中的路径实际上并不存在。我们已经解决了这个问题,但是我们仍然有这些不好的分区。当使用Tez查询这些表时,我们得到FileNotFound异常,指向HDFS中不存在的位置。如果我们使用MR而不是Tez,查询就可以工作(这对我来说很困惑),但它太慢了。
有没有办法列出所有分区有这个探头吗? MSCK REPAIR
似乎可以解决相反的问题,即数据存在于HDFS中,但Hive中没有分区。
编辑:更多信息。
以下是文件未找到异常的输出:
java.io.FileNotFoundException:文件hdfs://< server> / db / tables / 2016/03/14 / mytable不存在
如果我运行 show partitions< ; db.mytable>
,我将得到所有分区,包括一个用于 dt = 2016-03-14
。
show table扩展如'< db.mytable>'分区(dt ='2016-03-14'
返回相同的位置:
位置:hdfs:// server / db / tables / 2016/03/14 / mytable
。
因为我们知道'msck repair'命令会根据目录添加分区,所以先删除所有分区
hive> ALTER TABLE mytable drop if exists partitions(p<>);
上面的命令删除所有分区,然后使用 msck repair
命令删除所有分区,然后它会从目录中创建分区表格位置。
hive> msck修复表mytable
We had an issue with our ingestion process that would result in partitions being added to a table in Hive, but the path in HDFS didn't actually exist. We've fixed that issue, but we still have these bad partitions. When querying these tables using Tez, we get FileNotFound exception, pointing to the location in HDFS that doesn't exist. If we use MR instead of Tez, the query works (which is very confusing to me), but it's too slow.
Is there a way to list all the partitions that have this probem? MSCK REPAIR
seems to handle the opposite problem, where the data exists in HDFS but there is no partition in Hive.
EDIT: More info. Here's the output of the file not found exception:
java.io.FileNotFoundException: File hdfs://<server>/db/tables/2016/03/14/mytable does not exist.
If I run show partitions <db.mytable>
, I'll get all the partitions, including one for dt=2016-03-14
.
show table extended like '<db.mytable>' partition(dt='2016-03-14'
returns the same location:
location:hdfs://server/db/tables/2016/03/14/mytable
.
MSCK REPAIR TABLE <tablename>
does not provide this facility and I also face this same issue and I found solution for this,
As we know 'msck repair' command add partitions based on directory, So first drop all partitions
hive>ALTER TABLE mytable drop if exists partitions(p<>'');
above command remove all partitions ,
then use msck repair
command then it will create partition from directory present at table location.
hive>msck repair table mytable
这篇关于在Metastore中进行分区但HDFS中不存在路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!