Postgres通过最大查询来减慢分组 [英] Postgres Slow group by query with max

查看:81
本文介绍了Postgres通过最大查询来减慢分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是postgres 9.1,我有一个包含大约3.5M行eventtype(varchar)和eventtime(timestamp)的表格 - 以及其他一些字段。只有大约20种不同的事件类型,事件时间跨度大约4年。

I am using postgres 9.1 and I have a table with about 3.5M rows of eventtype (varchar) and eventtime (timestamp) - and some other fields. There are only about 20 different eventtype's and the event time spans about 4 years.

我想获取每种事件类型的最后时间戳。如果我运行一个查询,如:

I want to get the last timestamp of each event type. If I run a query like:

select eventtype, max(eventtime)
from allevents
group by eventtype

大约需要20秒。选择不同的事件类型也同样缓慢。查询计划显示表的完整顺序扫描 - 并不奇怪,它是缓慢的。

it takes around 20 seconds. Selecting distinct eventtype's is equally slow. The query plan shows a full sequential scan of the table - not surprising it is slow.

解释上述查询的分析结果:

Explain analyse for the above query gives:

HashAggregate  (cost=84591.47..84591.68 rows=21 width=21) (actual time=20918.131..20918.141 rows=21 loops=1)
  ->  Seq Scan on allevents  (cost=0.00..66117.98 rows=3694698 width=21) (actual time=0.021..4831.793 rows=3694392 loops=1)
Total runtime: 20918.204 ms

如果我添加一个where子句来选择一个特定的事件类型,它需要从40ms到150ms的任何地方,这至少是合适的。

If I add a where clause to select a specific eventtype, it takes anywhere from 40ms to 150ms which is at least decent.

选择特定事件类型时的查询计划:

Query plan when selecting specific eventtype:

GroupAggregate  (cost=343.87..24942.71 rows=1 width=21) (actual time=98.397..98.397 rows=1 loops=1)
  ->  Bitmap Heap Scan on allevents  (cost=343.87..24871.07 rows=14325 width=21) (actual time=6.820..89.610 rows=19736 loops=1)
        Recheck Cond: ((eventtype)::text = 'TEST_EVENT'::text)
        ->  Bitmap Index Scan on allevents_idx2  (cost=0.00..340.28 rows=14325 width=0) (actual time=6.121..6.121 rows=19736 loops=1)
              Index Cond: ((eventtype)::text = 'TEST_EVENT'::text)
Total runtime: 98.482 ms

主键是(eventtype, EVENTTIME)。我还有以下索引:

Primary key is (eventtype, eventtime). I also have the following indexes:

allevents_idx (event time desc, eventtype)
allevents_idx2 (eventtype).

如何加快查询速度?

在下面用@denis建议的相关子查询查询结果中,有14个手动输入的值给出:

Results of query play for correlated subquery suggested by @denis below with 14 manually entered values gives:

Function Scan on unnest val  (cost=0.00..185.40 rows=100 width=32) (actual time=0.121..8983.134 rows=14 loops=1)
   SubPlan 2
     ->  Result  (cost=1.83..1.84 rows=1 width=0) (actual time=641.644..641.645 rows=1 loops=14)
          InitPlan 1 (returns $1)
             ->  Limit  (cost=0.00..1.83 rows=1 width=8) (actual time=641.640..641.641 rows=1 loops=14)
                  ->  Index Scan using allevents_idx on allevents  (cost=0.00..322672.36 rows=175938 width=8) (actual time=641.638..641.638 rows=1 loops=14)
                         Index Cond: ((eventtime IS NOT NULL) AND ((eventtype)::text = val.val))
Total runtime: 8983.203 ms

使用@jjanes建议的递归查询,查询按照以下计划在4到5秒之间运行:

Using the recursive query suggested by @jjanes, the query runs between 4 and 5 seconds with the following plan:

CTE Scan on t  (cost=260.32..448.63 rows=101 width=32) (actual time=0.146..4325.598 rows=22 loops=1)
  CTE t
    ->  Recursive Union  (cost=2.52..260.32 rows=101 width=32) (actual time=0.075..1.449 rows=22 loops=1)
          ->  Result  (cost=2.52..2.53 rows=1 width=0) (actual time=0.074..0.074 rows=1 loops=1)
            InitPlan 1 (returns $1)
                  ->  Limit  (cost=0.00..2.52 rows=1 width=13) (actual time=0.070..0.071 rows=1 loops=1)
                        ->  Index Scan using allevents_idx2 on allevents  (cost=0.00..9315751.37 rows=3696851 width=13) (actual time=0.070..0.070 rows=1 loops=1)
                              Index Cond: ((eventtype)::text IS NOT NULL)
          ->  WorkTable Scan on t  (cost=0.00..25.58 rows=10 width=32) (actual time=0.059..0.060 rows=1 loops=22)
                Filter: (eventtype IS NOT NULL)
                SubPlan 3
                  ->  Result  (cost=2.53..2.54 rows=1 width=0) (actual time=0.059..0.059 rows=1 loops=21)
                        InitPlan 2 (returns $3)
                          ->  Limit  (cost=0.00..2.53 rows=1 width=13) (actual time=0.057..0.057 rows=1 loops=21)
                                ->  Index Scan using allevents_idx2 on allevents  (cost=0.00..3114852.66 rows=1232284 width=13) (actual time=0.055..0.055 rows=1 loops=21)
                                      Index Cond: (((eventtype)::text IS NOT NULL) AND ((eventtype)::text > t.eventtype))
  SubPlan 6
    ->  Result  (cost=1.83..1.84 rows=1 width=0) (actual time=196.549..196.549 rows=1 loops=22)
          InitPlan 5 (returns $6)
            ->  Limit  (cost=0.00..1.83 rows=1 width=8) (actual time=196.546..196.546 rows=1 loops=22)
                  ->  Index Scan using allevents_idx on allevents  (cost=0.00..322946.21 rows=176041 width=8) (actual time=196.544..196.544 rows=1 loops=22)
                        Index Cond: ((eventtime IS NOT NULL) AND ((eventtype)::text = t.eventtype))
Total runtime: 4325.694 ms


推荐答案

您需要的是跳过扫描或宽松索引扫描。 PostgreSQL的规划器尚未自动实现,但您可以使用递归查询来欺骗它。

What you need is a "skip scan" or "loose index scan". PostgreSQL's planner does not yet implement those automatically, but you can trick it into using one by using a recursive query.

WITH RECURSIVE  t AS (
SELECT min(eventtype) AS eventtype FROM allevents
           UNION ALL
SELECT (SELECT min(eventtype) as eventtype FROM allevents WHERE eventtype > t.eventtype)
   FROM t where t.eventtype is not null
)
select eventtype, (select max(eventtime) from allevents where eventtype=t.eventtype) from t;

可能有办法将max(eventtime)折叠到递归查询中,而不是在外部进行该查询,但如果是这样的话,我还没有找到它。

There may be a way to collapse the max(eventtime) into the recursive query rather than doing it outside that query, but if so I have not hit upon it.

这需要一个关于(eventtype,eventtime)的索引以提高效率。您可以在事件时间将其设置为DESC,但这不是必需的。这只有在事件类型只有一些不同的值时才有效(在你的情况中有21个)。

This needs an index on (eventtype, eventtime) in order to be efficient. You can have it be DESC on the eventtime, but that is not necessary. This is efficiently only if eventtype has only a few distinct values (21 of them, in your case).

这篇关于Postgres通过最大查询来减慢分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆