HIVE选择计数（）非空返回的值高于选择计数（） [英] HIVE select count() non null returns higher value than select count()

查看：873 发布时间：2018/6/12 13:33:56 hive hiveql

本文介绍了HIVE选择计数（*）非空返回的值高于选择计数（*）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正在使用Hive进行一些数据探索，无法解释以下行为。假设我有一个带有字段master_id的表（名为mytable）。

当我计算我得到的行数时

  select count（*）as c from mytable 
c 
 1129563

如果我要计算具有非null master_id的行数，我会得到一个更高的数字

  select count（*）as mytable from mytable其中master_id不为空
c 
 1134041

  select count（*）as mytable from mytable其中master_id是空的
c 
 0

我无法解释如何添加where语句可以增加最终的行数。有没有人有任何解释这种行为的提示？

谢谢 >很可能你的查询没有使用统计数据，因为这个参数被设置：
set hive.compute.query.using。统计= TRUE;
尝试将其设置为false并再次执行。

或者，您可以在表格上计算统计信息。
请参阅 ANALYZE TABLE SYNTAX a>

也可以在INSERT OVERWRITE自动收集统计信息：

set hive.stats.autogather = true;

I am currently doing some data exploration with Hive and cannot explain the following behavior. Say I have a table (named mytable) with a field master_id.

When I count the number of row I get
select count(*) as c from mytable c 1129563
If I want to count the number of row with a non null master_id, I get a higher number
select count(*) as c from mytable where master_id is not null c 1134041
Additionally, the master_id seems to be never null.
select count(*) as c from mytable where master_id is null c 0
I cannot explain how adding a where statement can increase the number of rows eventually. Does anyone have any hint to explain this behavior ?

Thanks
解决方案
Most probably your query without where is using statistics because of this parameter is set:
set hive.compute.query.using.stats=true;
Try to set it false and execute again.
Alternatively you can compute statistics on the table. See ANALYZE TABLE SYNTAX

Also it's possible to gather statistics during INSERT OVERWRITE automatically:
set hive.stats.autogather=true;

这篇关于HIVE选择计数（*）非空返回的值高于选择计数（*）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

HIVE选择计数（）非空返回的值高于选择计数（） [英] HIVE select count() non null returns higher value than select count()

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

HIVE选择计数（*）非空返回的值高于选择计数（*） [英] HIVE select count(*) non null returns higher value than select count(*)

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

HIVE选择计数（）非空返回的值高于选择计数（） [英] HIVE select count() non null returns higher value than select count()

登录关闭