奇怪的计数(*)实现? [英] Strange count(*) implementation?

查看:61
本文介绍了奇怪的计数(*)实现?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嗨Posgres用户/开发人员,


任何人都可以解释为什么PosgreSQL(Linux上的7.4.5版本)执行完整的

表扫描来计算在真空分析后在基表上计数(*)

已经完成,没有任何可能已经过时的任何

统计数据的更新。正如人们所预料的那样,直接解释命令确实会立即从目录中提供正确数量的
元组。还是

优化器认为它需要全表扫描来计算。


见下面的例子:


------ 8< ------------------------------------------ -------------------------------------------------- -


TestDB =#\d test_tbl;

表" public.test_tbl"

列|输入|修饰符

-------- + --------- + -----------

pre |整数|不是null

name |文字| not null

索引:

" test_tbl_pkey"主键,btree(pre)

" test_tbl_pre_index" unique,btree(pre)

" test_tbl_name_index" btree(名字)


TestDB =#解释从test_tbl选择计数(*);

查询计划

---- -------------------------------------------------- ----------------------

汇总(成本= 34293.60..34293.60行= 1宽度= 0)

- > seq扫描test_tbl(费用= 0.00..34293.60行= 166558宽度= 0)

(2行)


时间:25.188毫秒

TestDB =#select count(*)from test_tbl;

count

--------

166558

(1行)


时间:1024.803 ms

TestDB =#


------ 8< ------------------------------------------ -------------------------------------------------- -


这个看似奇怪的计数实现的结果是非常慢的数量。


问候,

Henk Ernst Blok


-

地址:DB集团,计算机科学,EEMCS部门,特温特大学, br />
PO Box 217,7500 AE,ENSCHEDE,THE NETHERLANDS

phone:++ 31(0)53 489 3754(如果没有回复:3690)

电子邮件: h.******@utwente.nl

WWW : http:// www .cs.utwente.nl / ~blokh

---------------------------(结束广播)---------------------------

提示4:不要杀人-9 ''postmaster

Hi Posgres users/developers,

Can anyone explain why PosgreSQL (version 7.4.5 on Linux) does a full
table scan to compute a count(*) on a base table after a vacuum analyze
has been done with no following updates that might have outdated any
statistics. Strangly the explain command does give the correct number of
tuples instantaniously from the catalog, as one would expect. Still the
optimizer thinks it needs a full table scan to do count.

See example below:

------8<---------------------------------------------------------------------------------------------

TestDB=# \d test_tbl;
Table "public.test_tbl"
Column | Type | Modifiers
--------+---------+-----------
pre | integer | not null
name | text | not null
Indexes:
"test_tbl_pkey" primary key, btree (pre)
"test_tbl_pre_index" unique, btree (pre)
"test_tbl_name_index" btree (name)

TestDB=# explain select count(*) from test_tbl;
QUERY PLAN
----------------------------------------------------------------------------
Aggregate (cost=34293.60..34293.60 rows=1 width=0)
-> Seq Scan on test_tbl (cost=0.00..34293.60 rows=166558 width=0)
(2 rows)

Time: 25.188 ms
TestDB=# select count(*) from test_tbl;
count
--------
166558
(1 row)

Time: 1024.803 ms
TestDB=#

------8<---------------------------------------------------------------------------------------------

The consequence of this seemingly odd count implementation is a very
very slow count.

Regards,
Henk Ernst Blok

--
address: DB group, Computer Science, EEMCS Dept., University of Twente,
PO Box 217, 7500 AE, ENSCHEDE, THE NETHERLANDS
phone: ++31 (0)53 489 3754 (if no response: 3690)
email: h.******@utwente.nl
WWW: http://www.cs.utwente.nl/~blokh
---------------------------(end of broadcast)---------------------------
TIP 4: Don''t ''kill -9'' the postmaster

推荐答案

Henk Ernst Blok写道:
Henk Ernst Blok wrote:
嗨Posgres用户/开发者,

任何人都可以解释为什么PosgreSQL(Linux上的7.4.5版本)进行完整的表扫描,以便在真空分析完成后计算基表上的计数(*)
没有任何可能已过时的任何统计数据的更新。正如人们所预料的那样,直接解释命令确实会立即从目录中提供正确数量的元组。
优化器仍然认为需要全表扫描才能进行计数。
这个看似奇怪的计数实现的结果是非常缓慢的计数。
Hi Posgres users/developers,

Can anyone explain why PosgreSQL (version 7.4.5 on Linux) does a full
table scan to compute a count(*) on a base table after a vacuum analyze
has been done with no following updates that might have outdated any
statistics. Strangly the explain command does give the correct number of
tuples instantaniously from the catalog, as one would expect. Still the
optimizer thinks it needs a full table scan to do count. The consequence of this seemingly odd count implementation is a very
very slow count.




简单来说,count()不会''看看统计数据,因为大部分时间都是过时的。在任何情况下,它们对于带有WHERE子句的任何

查询都没有用。


下一个最明显的选择是使用主键索引而不是扫描表格的
。但是,MVCC意味着我们可以拥有多个视图

表,其中一些后端看到的行数与

其他行不同。所以 - 我们需要将多个索引条目版本存储为

以及多行版本,或者您需要在这些情况下检查

中的实际行。 PostgreSQL做了第二次,这导致了你看到的完整扫描




有很多讨论(以及max()/ min ()汇总

函数)在邮件列表档案中。


HTH

-

Richard Huxton

Archonet Ltd


-------------------------- - (广播结束)---------------------------

提示3:如果通过Usenet发布/阅读,请发送适当的

subscribe-nomail命令到 ma ******* @ postgresql。组织以便您的

消息可以干净地通过邮件列表



To put it simply, count() doesn''t look at the statistics because most of
the time they are out of date. In any case, they aren''t useful for any
query with a WHERE clause.

The next most obvious choice is to use the primary-key index rather than
scanning the table. However, MVCC means that we can have multiple views
of the table, with some backends seeing a different number of rows than
others. So - either we need to store multiple index-entry versions as
well as multiple row versions or you need to check the actual row in
these cases. PostgreSQL does the second, which results in the full scan
which you see.

There is plenty of discussion of this (and also max()/min() aggregate
functions) in the mailing list archives.

HTH
--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postgresql.org so that your
message can get through to the mailing list cleanly


hi,


On Tue,2004-10-26 at 10:16,Henk Ernst Blok写道:


On Tue, 2004-10-26 at 10:16, Henk Ernst Blok wrote:
嗨Posgres用户/开发者,

任何人都可以解释为什么PosgreSQL(Linux上的7.4.5版本)执行完整的表扫描,以便在真空分析后计算基表上的计数(*) lyze
已经完成,没有任何可能已经过时的任何统计数据的更新。正如人们所预料的那样,直接解释命令确实会立即从目录中提供正确数量的元组。仍然
优化器认为它需要全表扫描来计算。

....这个看似奇怪的计数实现的结果非常缓慢。
Hi Posgres users/developers,

Can anyone explain why PosgreSQL (version 7.4.5 on Linux) does a full
table scan to compute a count(*) on a base table after a vacuum analyze
has been done with no following updates that might have outdated any
statistics. Strangly the explain command does give the correct number of
tuples instantaniously from the catalog, as one would expect. Still the
optimizer thinks it needs a full table scan to do count.
.... The consequence of this seemingly odd count implementation is a very
very slow count.




查询规划器应该如何知道真空度已经足够了,那么

是不是对表格进行了修改?


如果您对粗略数字感兴趣,可以阅读系统表

以获取最后的真空统计数据。如果你需要快速计算并且可以在插入上花费一些周期,只需在插入后制作一个带有计数结果的缓冲表




不合格的计数,例如没有WHERE子句不需要

可以使用很多。


问候

Tino

---------------------------(广播结束)----------------- ----------

提示9:如果你的

加入列的数据类型没有,那么规划人员会忽略你选择索引扫描的愿望匹配



How should the query planner know the vacuum was recent enough and there
were no modifications to the table since?

If you are interested in rough numbers you could read the system tables
for the last vacuum statistics. If you need fast count and can spend
some cycles on inserts, just make a buffer table with count results
after insert.

Unqualified count e.g. without a WHERE clause should not need to
be used a lot.

Regards
Tino
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column''s datatypes do not match





我的问题更多的是基本性质,因为扫描计数

似乎与关于如何优化它的理论相矛盾。


我假设(d)更昂贵的统计数据(例如,价值分配

info)仅在过时或根据要求时更新(手动

vacuum)。通常,其他/便宜的统计数据可以很容易地以增量方式保持
,从而反映每次更新后的实际表状态。

课程,MVCC原则似乎让事情变得更加复杂我现在明白了。但无论如何,跟踪统计数据是否是脏的必须在系统中。它如何决定何时自行完成

统计更新?因此,如果解释可以得到最新的计数,为什么

如果你知道统计数据仍然是

acurate那么也不会在计数中使用它?


那么,如果

你正在处理OLAP加载,那么没有任何地方的计数(*)会经常发生,这在我的设置中就是这种情况。所以,

我确实已经通过预先计算所有

计数来解决性能问题,我可以预测它是有用的。


无论如何,我理解这个问题在我列出之前已经被讨论了(b / b
在列表中搜索(搜索档案/网站是/不是很有效)b $ b有效,所以我不知道,直到有人告诉我,对不起)。所以,我将b $ b留给开发人员如何处理这个话题。


问候,


Henk Ernst

Tino Wildenhain写道:
Hi,

My question was more of a fundamental nature as this count by scan
seemed to contradict the theory about how to optimize it.

I assume(d) the more expensive statistics (e.g., value distribution
info) are updated only when outdated too much or on request (manual
vacuum). Usually, other/cheap statistics can easily be maintained
incrementally and thus reflect actual table state after each update. Of
course, the MVCC principle seems to make things a bit more complicated I
understand now. But tracking whether statistics are dirty has to be in
the system anyway. How does it otherwise decide when to do its own
statistics updates? So if explain can get the most recent count, why
not use it in the count as well if you know the statistics are still
acurate?

By the way, a count(*) without any where does occur very frequently if
you are dealing with an OLAP load, which is the case in my setting. So,
I indeed already ''fixed'' the performance problem by precomputing all
counts I can predict to be of any use.

Anyway, I understood this issue has been subject to discusion before I
was on the list (searching the archive/website was/is not very
effective, so I didn''t know until someone told me so, sorry). So, I
leave it to the developers what to do with this topic.

Regards,

Henk Ernst
Tino Wildenhain wrote:


在星期二,2004-10-26 10:16,Henk Ernst Blok写道:



On Tue, 2004-10-26 at 10:16, Henk Ernst Blok wrote:

嗨Posgres用户/开发人员,

任何人都可以解释为什么PosgreSQL(Linux上的7.4.5版本)完整
表扫描计算真空分析后基表上的计数(*)
已完成,没有任何可能已过时的统计数据的更新。正如人们所预料的那样,直接解释命令确实会立即从目录中提供正确数量的元组。
优化器仍然认为它需要全表扫描来计算。
Hi Posgres users/developers,

Can anyone explain why PosgreSQL (version 7.4.5 on Linux) does a full
table scan to compute a count(*) on a base table after a vacuum analyze
has been done with no following updates that might have outdated any
statistics. Strangly the explain command does give the correct number of
tuples instantaniously from the catalog, as one would expect. Still the
optimizer thinks it needs a full table scan to do count.


...


...

这个看似奇怪的计数实现的结果是一个非常缓慢的计数。
The consequence of this seemingly odd count implementation is a very
very slow count.



查询规划器应该如何知道真空是最近的,并且
从那以后没有对表格进行修改?

如果您对粗略数字感兴趣,可以阅读系统表格
以获取最后的真空统计数据。如果你需要快速计数并且可以在插入上花费一些周期,只需在插入后制作一个带有计数结果的缓冲表。

不合格的计数,例如没有WHERE子句不需要经常使用。

关于
Tino



How should the query planner know the vacuum was recent enough and there
were no modifications to the table since?

If you are interested in rough numbers you could read the system tables
for the last vacuum statistics. If you need fast count and can spend
some cycles on inserts, just make a buffer table with count results
after insert.

Unqualified count e.g. without a WHERE clause should not need to
be used a lot.

Regards
Tino




-

地址:数据库组,计算机科学,EEMCS部门,特温特大学,

PO Box 217,7500 AE,ENSCHEDE,THE NETHERLANDS

电话:++ 31(0)53 489 3754(如果没有回复:3690)

电子邮件: h。****** @ utwente.nl

WWW: http://www.cs.utwente.nl/~blokh


这篇关于奇怪的计数(*)实现?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆