快速找到PostgreSQL中表的行计数 [英] Fast way to discover the row count of a table in PostgreSQL

查看:1380
本文介绍了快速找到PostgreSQL中表的行计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要知道表中的行数来计算百分比。如果总计数大于某个预定义的常数,我将使用常量值。否则,我将使用实际的行数。

I need to know the number of rows in a table to calculate a percentage. If the total count is greater than some predefined constant, I will use the constant value. Otherwise, I will use the actual number of rows.

我可以使用 SELECT count(*)FROM table 。但是,如果我的常数值为 500,000 ,并且我的表格中有 5,000,000,000 行,则对所有行计数会浪费很多时间。

I can use SELECT count(*) FROM table. But if my constant value is 500,000 and I have 5,000,000,000 rows in my table, counting all rows will waste a lot of time.

一旦超过常数值,是否可以停止计数?

Is it possible to stop counting as soon as my constant value is surpassed?

我只需要确切的行数,只要它低于给定的限制。

I need the exact number of rows only as long as it's below the given limit. Otherwise, if the count is above the limit, I use the limit value instead and want the answer as fast as possible.

有这样的情况:

SELECT text,count(*), percentual_calculus()  
FROM token  
GROUP BY text  
ORDER BY count DESC;


推荐答案

表已知在PostgreSQL中很慢。要获取精确的数字,由于 MVCC 的性质,它必须完全计算行数。 。如果计数必须确切,则可以快速加快显着。 p>

Counting rows in big tables is known to be slow in PostgreSQL. To get a precise number it has to do a full count of rows due to the nature of MVCC. There is a way to speed this up dramatically if the count does not have to be exact like it seems to be in your case.

Instead of getting the exact count (slow with big tables):

SELECT count(*) AS exact_count FROM myschema.mytable;

您会得到这样的近似估计(非常快): p>

You get a close estimate like this (extremely fast):

SELECT reltuples::bigint AS estimate FROM pg_class where relname='mytable';

估计值的接近程度取决于您是否运行 ANALYZE 。通常非常接近。

请参阅 PostgreSQL Wiki常见问题

专用的wiki页面(count(*))

How close the estimate is depends on whether you run ANALYZE enough. It is usually very close.
See the PostgreSQL Wiki FAQ.
Or the dedicated wiki page for count(*) performance.

PostgreSQL Wiki中的文章 >有点草率。它忽略了在一个数据库中可以有多个相同名称的表 - 在不同的模式中。为了解决这个问题:

The article in the PostgreSQL Wiki is was a bit sloppy. It ignored the possibility that there can be multiple tables of the same name in one database - in different schemas. To account for that:

SELECT c.reltuples::bigint AS estimate
FROM   pg_class c
JOIN   pg_namespace n ON n.oid = c.relnamespace
WHERE  c.relname = 'mytable'
AND    n.nspname = 'myschema'



或更好



Or better still

SELECT reltuples::bigint AS estimate
FROM   pg_class
WHERE  oid = 'myschema.mytable'::regclass;

更快,更简单,更安全,更优雅。请参阅对象标识类型手册。

Faster, simpler, safer, more elegant. See the manual on Object Identifier Types.

在Postgres 9.4+中使用 to_regclass('myschema.mytable')可避免无效表名的例外:

Use to_regclass('myschema.mytable') in Postgres 9.4+ to avoid exceptions for invalid table names:


SELECT 100 * count(*) AS estimate FROM mytable TABLESAMPLE SYSTEM (1);

@ a_horse commented SELECT 命令可能有用,如果 pg_class 中的统计信息由于某种原因不够当前。例如:

Like @a_horse commented, the newly added clause for the SELECT command might be useful if statistics in pg_class are not current enough for some reason. For example:


  • autovacuum 正在运行。

  • 大于 INSERT DELETE 后立即。

  • TEMPORARY 表( autovacuum 不涵盖)。

  • No autovacuum running.
  • Immediately after a big INSERT or DELETE.
  • TEMPORARY tables (which are not covered by autovacuum).

这只看到一个随机的 n %(在示例中为 1 。更大的样本增加了成本,并减少错误,你的选择。准确性取决于更多因素:

This only looks at a random n % (1 in the example) selection of blocks and counts rows in it. A bigger sample increases the cost and reduces the error, your pick. Accuracy depends on more factors:


  • 行大小分布。

  • 死的元组或 FILLFACTOR 占用每块空间。如果不均匀分布在表格中,估计值可能会关闭。

  • 一般舍入错误。

  • Distribution of row size. If a given block happens to hold wider than usual rows, the count is lower than usual etc.
  • Dead tuples or a FILLFACTOR occupy space per block. If unevenly distributed across the table, the estimate may be off.
  • General rounding errors.

在大多数情况下, pg_class 的估计将更快更准确。

In most cases the estimate from pg_class will be faster and more accurate.


首先,我需要知道表中的行数,如果总
的计数大于某个预定义的常数, p>

First, I need to know the number of rows in that table, if the total count is greater than some predefined constant,

是否...


。 ..是可能的,在计数通过我的常数值,它将
停止计数(而不是等待完成计数,以通知
行计数更大)。

... is possible at the moment the count pass my constant value, it will stop the counting (and not wait to finish the counting to inform the row count is greater).

是。您可以使用具有 LIMIT / strong>:

Yes. You can use a subquery with LIMIT:

SELECT count(*) FROM (SELECT 1 FROM token LIMIT 500000) t;

Postgres 实际停止计数超过指定限额,您会收到 n 行(在本示例中为500000),以及 n 的精确和当前计数。虽然不如$ pg_class 中的估计快得多。

Postgres actually stops counting beyond the given limit, you get an exact and current count for up to n rows (500000 in the example), and n otherwise. Not nearly as fast as the estimate in pg_class, though.

这篇关于快速找到PostgreSQL中表的行计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆