Hive count(*)查询不调用mapreduce [英] Hive count(*) query is not invoking mapreduce
问题描述
我在配置单元中有外部表,我试图从table_name 查询运行 select count(*),但是查询立即返回并给出了我认为已经存储的结果。查询返回的结果不正确。有没有办法强制map reduce作业,并且每次都要执行查询。
注意:所有外部表格都不遵循此行为,但其中一些表格不符合。
/ p>使用的版本:Hive 0.14.0.2.2.6.0-2800,Hadoop 2.6.0.2.2.6.0-2800(Hortonworks)
经过一些发现后,我得到了一个方法,可以启动MR来计算orc表上的记录数。
$ b
ANALYZE TABLE '表名'分区('分区列')COMPUTE STATISTICS;
- 或
ANALYZE TABLE '表名'计算统计信息;
这不是计数的直接替代方法(*),但在表格中提供最新记录数。
I have external tables in hive, I am trying to run select count(*) from table_name query but the query returns instantaneously and gives result which is i think already stored. The result returned by query is not correct. Is there a way to force a map reduce job and make the query execute each time.
Note: This behavior is not followed for all external tables but some of them.
Versions used : Hive 0.14.0.2.2.6.0-2800, Hadoop 2.6.0.2.2.6.0-2800 (Hortonworks)
After some finding I have got a method that kicks off MR for counting number of records on orc table.
ANALYZE TABLE 'table name' PARTITION('partition columns') COMPUTE STATISTICS; --OR ANALYZE TABLE 'table name' COMPUTE STATISTICS;
This is not a direct alternative for count(*) but provides latest count of records in the table.
这篇关于Hive count(*)查询不调用mapreduce的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!