Hive count(*)查询不调用mapreduce [英] Hive count(*) query is not invoking mapreduce

查看:932
本文介绍了Hive count(*)查询不调用mapreduce的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在配置单元中有外部表,我试图从table_name 查询运行 select count(*),但是查询立即返回并给出了我认为已经存储的结果。查询返回的结果不正确。有没有办法强制map reduce作业,并且每次都要执行查询。



注意:所有外部表格都不遵循此行为,但其中一些表格不符合。

/ p>

使用的版本:Hive 0.14.0.2.2.6.0-2800,Hadoop 2.6.0.2.2.6.0-2800(Hortonworks)

解决方案

经过一些发现后,我得到了一个方法,可以启动MR来计算orc表上的记录数。


$ b

ANALYZE TABLE '表名'分区('分区列')COMPUTE STATISTICS;
- 或
ANALYZE TABLE '表名'计算统计信息;

这不是计数的直接替代方法(*),但在表格中提供最新记录数。


I have external tables in hive, I am trying to run select count(*) from table_name query but the query returns instantaneously and gives result which is i think already stored. The result returned by query is not correct. Is there a way to force a map reduce job and make the query execute each time.

Note: This behavior is not followed for all external tables but some of them.

Versions used : Hive 0.14.0.2.2.6.0-2800, Hadoop 2.6.0.2.2.6.0-2800 (Hortonworks)

解决方案

After some finding I have got a method that kicks off MR for counting number of records on orc table.

ANALYZE TABLE 'table name' PARTITION('partition columns') COMPUTE STATISTICS; --OR ANALYZE TABLE 'table name' COMPUTE STATISTICS;

This is not a direct alternative for count(*) but provides latest count of records in the table.

这篇关于Hive count(*)查询不调用mapreduce的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆