Hive count（）查询不调用mapreduce [英] Hive count() query is not invoking mapreduce

查看：932 发布时间：2018/5/31 20:22:52 hadoop hive

本文介绍了Hive count（*）查询不调用mapreduce的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在配置单元中有外部表，我试图从table_name 查询运行 select count（*），但是查询立即返回并给出了我认为已经存储的结果。查询返回的结果不正确。有没有办法强制map reduce作业，并且每次都要执行查询。

注意：所有外部表格都不遵循此行为，但其中一些表格不符合。
/ p>

使用的版本：Hive 0.14.0.2.2.6.0-2800，Hadoop 2.6.0.2.2.6.0-2800（Hortonworks）

解决方案
经过一些发现后，我得到了一个方法，可以启动MR来计算orc表上的记录数。

$ b
ANALYZE TABLE '表名'分区（'分区列'）COMPUTE STATISTICS;
- 或
ANALYZE TABLE '表名'计算统计信息;

这不是计数的直接替代方法（*），但在表格中提供最新记录数。

I have external tables in hive, I am trying to run select count(*) from table_name query but the query returns instantaneously and gives result which is i think already stored. The result returned by query is not correct. Is there a way to force a map reduce job and make the query execute each time.

Note: This behavior is not followed for all external tables but some of them.

Versions used : Hive 0.14.0.2.2.6.0-2800, Hadoop 2.6.0.2.2.6.0-2800 (Hortonworks)
解决方案
After some finding I have got a method that kicks off MR for counting number of records on orc table.

ANALYZE TABLE 'table name' PARTITION('partition columns') COMPUTE STATISTICS; --OR ANALYZE TABLE 'table name' COMPUTE STATISTICS;

This is not a direct alternative for count(*) but provides latest count of records in the table.

这篇关于Hive count（*）查询不调用mapreduce的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Hive count（）查询不调用mapreduce [英] Hive count() query is not invoking mapreduce

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

Hive count（*）查询不调用mapreduce [英] Hive count(*) query is not invoking mapreduce

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

Hive count（）查询不调用mapreduce [英] Hive count() query is not invoking mapreduce

登录关闭