hive 表中 count(*) 结果错误 [英] Wrong result for count(*) in hive table
问题描述
我在 HIVE 中创建了一个表
I have created a table in HIVE
CREATE TABLE IF NOT EXISTS daily_firstseen_analysis (
firstSeen STRING,
category STRING,
circle STRING,
specId STRING,
language STRING,
osType STRING,
count INT)
PARTITIONED BY (day STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ' '
STORED AS orc;
count(*) 没有给我正确的结果
count(*) is not giving me correct result for this table
hive> select count(*) from daily_firstseen_analysis;
OK
75
Time taken: 0.922 seconds, Fetched: 1 row(s)
虽然这个表的行数是 959 行
While the number of rows in this table is 959 rows
hive> select * from daily_firstseen_analysis;
....
Time taken: 0.966 seconds, Fetched: 959 row(s)
它给出了 959 行的数据
it gives data with 959 rows
hive> ANALYZE TABLE daily_firstseen_analysis PARTITION(day) COMPUTE STATISTICS noscan;
Partition logdata.daily_firstseen_analysis{day=20140521} stats: [numFiles=6, numRows=70, totalSize=4433, rawDataSize=37202]
Partition logdata.daily_firstseen_analysis{day=20140525} stats: [numFiles=6, numRows=257, totalSize=4937, rawDataSize=136385]
Partition logdata.daily_firstseen_analysis{day=20140523} stats: [numFiles=6, numRows=211, totalSize=5059, rawDataSize=112140]
Partition logdata.daily_firstseen_analysis{day=20140524} stats: [numFiles=6, numRows=280, totalSize=5257, rawDataSize=148808]
Partition logdata.daily_firstseen_analysis{day=20140522} stats: [numFiles=6, numRows=141, totalSize=4848, rawDataSize=74938]
OK
Time taken: 5.098 seconds
我使用的 Hive 版本为 Hive 0.13.0.2.1.2.0-402
I am using hive with version Hive 0.13.0.2.1.2.0-402
注意:如果我们多次插入一个表,我在 count(*) 中发现了这个问题.单次插入创建的表没有这个问题
NOTE: I found this issue in count(*) if We are inserting into a table in more than one time. Tables created with single insert do not have this issue
推荐答案
我遇到了同样的问题,使用 ANALYZE 修复了它.按顺序运行这些命令应该会给你正确的计数:
I had the same problem, and using ANALYZE fixed it. Running these commands in order should give you the correct count:
hive> ANALYZE TABLE daily_firstseen_analysis PARTITION(day) COMPUTE STATISTICS;
hive> SELECT COUNT(*) FROM daily_firstseen_analysis;
即您必须在计数之前使用分析命令.你的问题有一半的答案.
i.e. you have to use the analyze command before the count. You have half the answer within your question.
这篇关于hive 表中 count(*) 结果错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!