Postgresl选择计数(*)耗时 [英] Postgresl select count(*) time-consuming

查看:168
本文介绍了Postgresl选择计数(*)耗时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 spring-data-jpa & postgresql-9.4

I am using spring-data-jpa & postgresql-9.4.

有一个表: tbl_oplog 。该表大约有700万行数据,并且需要在前端显示数据。

There is a table: tbl_oplog. This table has about seven million rows of data, and data is need to be displayed on the front end.(paged).

我使用 Spring#PagingAndSortingRepository ,然后发现数据查询非常慢。从日志中,我发现发出了两个SQL查询:

I use Spring#PagingAndSortingRepository , and then I found that the data query was very slow. From the logs, I found that two SQL queries were issued:

select
    oplog0_.id as id1_8_,
    oplog0_.deleted as deleted2_8_,
    oplog0_.result_desc as result_d3_8_,
    oplog0_.extra as extra4_8_,
    oplog0_.info as info5_8_,
    oplog0_.login_ipaddr as login_ip6_8_,
    oplog0_.level as level7_8_,
    oplog0_.op_type as op_type8_8_,
    oplog0_.user_name as user_nam9_8_,
    oplog0_.op_obj as op_obj10_8_,
    oplog0_.op as op11_8_,
    oplog0_.result as result12_8_,
    oplog0_.op_time as op_time13_8_,
    oplog0_.login_name as login_n14_8_ 
from
    tbl_oplog oplog0_ 
where
    oplog0_.deleted=false
order by
    oplog0_.op_time desc limit 10

并且: / p>

And:

 select
    count(oplog0_.id) as col_0_0_ 
from
    tbl_oplog oplog0_ 
where
    oplog0_.deleted=?

(第二条SQL语句用于填充页面对象,这是必需的)

(The second SQL statement is used to populate the page object,which is necessary)

我发现第二条语句非常耗时。为什么要花这么长时间?
如何优化? Mysql 会发生这种情况吗?

I found the second statement to be very time-consuming. Why does it take so long? How to optimize? Does this happen with Mysql?

还是有其他方法可以优化此要求? (看来选择计数是不可避免的。)

Or is there any other way I can optimize this requirement? (It seems that select count is inevitable).

编辑:
我将使用另一个表进行演示(相同):
表:

I'll use another table for the demonstration(same): Table:

select count(*) from tbl_gather_log;  // count is 6300931.cost 5.408S

从tbl_gather_log中选择count(*):

EXPLAIN select count(*) from tbl_gather_log:

Aggregate  (cost=246566.58..246566.59 rows=1 width=0)
  ->  Index Only Scan using tbl_gather_log_pkey on tbl_gather_log  (cost=0.43..230814.70 rows=6300751 width=0)

解释分析从tbl_gather_log中选择count(*):

EXPLAIN ANALYSE select count(*) from tbl_gather_log:

Aggregate  (cost=246566.58..246566.59 rows=1 width=0) (actual time=6697.102..6697.102 rows=1 loops=1)
  ->  Index Only Scan using tbl_gather_log_pkey on tbl_gather_log  (cost=0.43..230814.70 rows=6300751 width=0) (actual time=0.173..4622.674 rows=6300936 loops=1)
        Heap Fetches: 298
Planning time: 0.312 ms
Execution time: 6697.267 ms

EDIT2:

TABLE:

create table tbl_gather_log (
    id bigserial not null primary key,
    event_level int,
    event_time timestamp,
    event_type int,
    event_dis_type int,
    event_childtype int,
    event_name varchar(64),
    dev_name varchar(32),
    dev_ip varchar(32),
    sys_type varchar(16),
    event_content jsonb,
    extra jsonb
);

并且:

可能支持很多过滤条件,所以我不能简单地对已删除内容进行特殊操作。例如,可能从tbl_oplog中发出 select *,其中名称如xxx,类型= xxx限制为10 ,因此将有一个查询:从tbl_oplog中选择count *,其中名称如xxx,类型= xxx 。此外,我必须知道确切的计数。因为我需要显示前端有多少页。

There are probably many filtering criteria supported, so i can't simply do special operations on deleted.For example, a query might be issued select * from tbl_oplog where name like xxx and type = xxx limit 10,so, there will be a query:select count * from tbl_oplog where name like xxx and type = xxx . Futhermore, i have to know exact counts. because I need to show how many pages there are on the front end.

推荐答案

第二条语句需要很长时间,因为它具有扫描整个表以便对行进行计数。

The second statement takes a long time because it has to scan the whole table in order to count the rows.

您可以做的一件事是使用索引:

One thing you can do is use an index:

CREATE INDEX ON tbl_oplog (deleted) INCLUDE (id);
VACUUM tbl_oplog;  -- so you get an index only scan

假设 id 是主键,最好使用 count(*)并省略 INCLUDE

Assuming that id is the primary key, it would be much better to use count(*) and omit the INCLUDE clause from the index.

但最好的方法是使用估算值:

But the best is probably to use an estimate:

SELECT t.reltuples * freq.f AS estimated_rows
FROM pg_stats AS s
   JOIN pg_namespace AS n
      ON s.schemaname = n.nspname
   JOIN pg_class AS t
      ON s.tablename = t.relname
         AND n.oid = t.relnamespace
   CROSS JOIN LATERAL
      unnest(s.most_common_vals::text::boolean[]) WITH ORDINALITY AS val(v,id)
   JOIN LATERAL
      unnest(s.most_common_freqs) WITH ORDINALITY AS freq(f,id)
         USING (id)
WHERE s.tablename = 'tbl_oplog'
  AND s.attname = 'deleted'
  AND val.v = ?;

这将使用分布统计信息来估计所需的计数。

This uses the distribution statistics to estimate the desired count.

如果只是关于分页,则不需要确切的计数。

If it is just about pagination, you don't need exact counts.

阅读我的博客以获取更多有关PostgreSQL中计数的主题。

Read my blog for more on the topic of counting in PostgreSQL.

这篇关于Postgresl选择计数(*)耗时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆