如果分区目录不存在,聚合查询在配置单元中失败 [英] Aggregate queries fail in hive if partition directory doesn't exist
问题描述
蜂房> select * from test_tab where p ='2015_01_01_01';
OK
所用时间:2.168秒
但是,在运行任何聚合查询对于同一个分区,我得到一个错误:
hive>从test_tab中选择count(*),其中p ='2015_01_01_01';
FAILED:SemanticException java.io.FileNotFoundException:文件hdfs:// localhost:8020 / user / root / data / test_db / test_tab / p = 2015_01_01_01不存在。
我需要在聚合查询中具有与其他select查询中相同的行为。这可能是蜂巢中的一个错误。任何解决方法 - 这个问题的提示将不胜感激。最好的问候。
运行下面的命令
msck repair table test_tab;
然后运行您的查询
I am using Hive v1.2.1 with Tez. I have an external partitioned table. The partitions are hourly and of the form p=yyyy_mm_dd_hh. The situation is that these partition directories in hdfs are likely to be deleted sometime. After they are deleted, hive still contains the metadata for that partition, and a command 'show partitions ' would still list the partition whose directory was deleted from hdfs. Normally, this is not likely to cause any problem, and a select query for the partition(whose directory was deleted) would simply result an empty resultset:
hive> select * from test_tab where p='2015_01_01_01';
OK
Time taken: 2.168 seconds
However, on running any aggregate query against the same partition, I get an error:
hive> select count(*) from test_tab where p='2015_01_01_01';
FAILED: SemanticException java.io.FileNotFoundException: File hdfs://localhost:8020/user/root/data/test_db/test_tab/p=2015_01_01_01 does not exist.
I need to have the same behavior in aggregate queries as that in other select queries. This is probably a bug in hive. Any workaround-hints for this issue would be appreciated. Best Regards.
run below command
msck repair table test_tab;
and then run your query
这篇关于如果分区目录不存在,聚合查询在配置单元中失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!