计划者不使用索引顺序使用CTE对记录进行排序 [英] Planner not using index order to sort the records using CTE

查看:87
本文介绍了计划者不使用索引顺序使用CTE对记录进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图按条件以相同的顺序将一些id传递到排序索引的子句中,但是查询计划程序在执行索引搜索后显式对数据进行排序.以下是我的查询.

I am trying to pass some ids into an in-clause on a sorted index with the same order by condition but the query planner is explicitly sorting the data after performing index search. below are my queries.

  1. 生成一个临时表.

SELECT a.n/20 as n, md5(a.n::TEXT) as b INTO temp_table 
From generate_series(1, 100000) as a(n);

  • 创建索引

    CREATE INDEX idx_temp_table ON temp_table(n ASC, b ASC);
    

  • 在下面的查询中,计划程序使用索引排序,并且未明确对数据进行排序.(预期)

  • In below query, planner uses index ordering and doesn't explicitly sorts the data.(expected)

    EXPLAIN ANALYSE
    SELECT * from 
    temp_table WHERE n = 10
    ORDER BY  n, b
    limit 5;
    

  • 查询计划

    QUERY PLAN Limit  (cost=0.42..16.07 rows=5 width=36) (actual time=0.098..0.101 rows=5 loops=1)   
              ->  Index Only Scan using idx_temp_table on temp_table  (cost=0.42..1565.17 rows=500 width=36) (actual time=0.095..0.098 rows=5 loops=1)
                Index Cond: (n = 10)
                Heap Fetches: 5 Planning time: 0.551 ms Execution time: 0.128 ms
    

    1. 但是当我使用cte中的一个或多个id并将其传递给子句时,计划程序仅使用index来获取值,但之后会对其进行显式排序(不期望).

    1. but when i use one or more ids from a cte and pass them in clause then planner only uses index to fetch the values but explicitly sorts them afterwards (not expected).

    EXPLAIN ANALYSE
    WITH cte(x) AS (VALUES (10))
    SELECT * from temp_table 
    WHERE n IN ( SELECT x from cte)
    ORDER BY  n, b
    limit 5;
    

    然后计划者使用以下查询计划

    then planner uses below query plan

    查询计划

    QUERY PLAN
    Limit  (cost=85.18..85.20 rows=5 width=37) (actual time=0.073..0.075 rows=5 loops=1)
      CTE cte
        ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4) (actual time=0.001..0.002 rows=2 loops=1)
      ->  Sort  (cost=85.16..85.26 rows=40 width=37) (actual time=0.072..0.073 rows=5 loops=1)
            Sort Key: temp_table.n, temp_table.b
            Sort Method: top-N heapsort  Memory: 25kB
            ->  Nested Loop  (cost=0.47..84.50 rows=40 width=37) (actual time=0.037..0.056 rows=40 loops=1)
                  ->  Unique  (cost=0.05..0.06 rows=2 width=4) (actual time=0.009..0.010 rows=2 loops=1)
                        ->  Sort  (cost=0.05..0.06 rows=2 width=4) (actual time=0.009..0.010 rows=2 loops=1)
                              Sort Key: cte.x
                              Sort Method: quicksort  Memory: 25kB
                              ->  CTE Scan on cte  (cost=0.00..0.04 rows=2 width=4) (actual time=0.004..0.005 rows=2 loops=1)
                  ->  Index Only Scan using idx_temp_table on temp_table  (cost=0.42..42.02 rows=20 width=37) (actual time=0.012..0.018 rows=20 loops=2)
                        Index Cond: (n = cte.x)
                        Heap Fetches: 40
    Planning time: 0.166 ms
    Execution time: 0.101 ms
    

    1. 我试图在传递where子句中的id时进行显式排序,以便保持id中的排序顺序,但规划器仍显式排序

    1. I tried putting an explicit sorting while passing the ids in where clause so that sorted order in ids is maintained but still planner sorted explicitly

    EXPLAIN ANALYSE
    WITH cte(x) AS (VALUES (10))
    SELECT * from temp_table 
    WHERE n IN ( SELECT x from cte)
    ORDER BY  n, b
    limit 5;
    

    查询计划

    QUERY PLAN
    Limit  (cost=42.62..42.63 rows=5 width=37) (actual time=0.042..0.044 rows=5 loops=1)
      CTE cte
        ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=1)
      ->  Sort  (cost=42.61..42.66 rows=20 width=37) (actual time=0.042..0.042 rows=5 loops=1)
            Sort Key: temp_table.n, temp_table.b
            Sort Method: top-N heapsort  Memory: 25kB
            ->  Nested Loop  (cost=0.46..42.28 rows=20 width=37) (actual time=0.025..0.033 rows=20 loops=1)
                  ->  HashAggregate  (cost=0.05..0.06 rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=1)
                        Group Key: cte.x
                        ->  Sort  (cost=0.03..0.04 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=1)
                              Sort Key: cte.x
                              Sort Method: quicksort  Memory: 25kB
                              ->  CTE Scan on cte  (cost=0.00..0.02 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=1)
                  ->  Index Only Scan using idx_temp_table on temp_table  (cost=0.42..42.02 rows=20 width=37) (actual time=0.014..0.020 rows=20 loops=1)
                        Index Cond: (n = cte.x)
                        Heap Fetches: 20
    Planning time: 0.167 ms
    Execution time: 0.074 ms
    

    任何人都可以解释为什么计划者对数据使用显式排序吗?有没有一种方法可以使计划程序使用索引排序顺序,以便可以保存记录上的其他排序.在生产中,我们有类似的案例,但是我们选择的规模太大,但是只有少量记录需要分页获取.谢谢大家的期待!

    Can anyone explain why planner is using an explicit sort on the data? Is there a way to by pass this and make planner use the index sorting order so additional sorting on the records can be saved. In production, we have similar case but size of our selection is too big but only a handful of records needs to fetched with pagination. Thanks in anticipation!

    推荐答案

    这实际上是计划者的决定,使用更大的values()集,Postgres会切换到更智能的计划,而在合并.

    It is actually a decision made by the planner, with a larger set of values(), Postgres will switch to a smarter plan, with the sort done before the merge.

    select version();
    
    \echo +++++ Original
    
    EXPLAIN ANALYSE
    WITH cte(x) AS (VALUES (10))
    SELECT * from temp_table
    WHERE n IN ( SELECT x from cte)
    ORDER BY  n, b
    limit 5;
    
    \echo +++++ TEN Values
    EXPLAIN ANALYSE
    WITH cte(x) AS (VALUES (10),(11),(12),(13),(14),(15),(16),(17),(18),(19)
            )
    SELECT * from temp_table
    WHERE n IN ( SELECT x from cte)
    ORDER BY  n, b
    limit 5;
    
    \echo ++++++++ one row from table
    
    EXPLAIN ANALYSE
    WITH cte(x) AS (SELECT n FROM temp_table WHERE n = 10)
    SELECT * from temp_table
    WHERE n IN ( SELECT x from cte)
    ORDER BY  n, b
    limit 5;
    
    \echo ++++++++ one row from table TWO ctes
    
    EXPLAIN ANALYSE
    WITH val(x) AS (VALUES (10))
    ,  cte(x) AS (
            SELECT n FROM temp_table WHERE n IN (select x from val)
            )
    SELECT * from temp_table
    WHERE n IN ( SELECT x from cte)
    ORDER BY  n, b
    limit 5;
    


    产生的计划


    Resulting plans:

                                                    version                                                
    -------------------------------------------------------------------------------------------------------
     PostgreSQL 11.3 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit
    (1 row)
    
    +++++ Original
                                                                          QUERY PLAN                                                                      
    ------------------------------------------------------------------------------------------------------------------------------------------------------
     Limit  (cost=13.72..13.73 rows=5 width=37) (actual time=0.197..0.200 rows=5 loops=1)
       CTE cte
         ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
       ->  Sort  (cost=13.71..13.76 rows=20 width=37) (actual time=0.194..0.194 rows=5 loops=1)
             Sort Key: temp_table.n, temp_table.b
             Sort Method: top-N heapsort  Memory: 25kB
             ->  Nested Loop  (cost=0.44..13.37 rows=20 width=37) (actual time=0.083..0.097 rows=20 loops=1)
                   ->  HashAggregate  (cost=0.02..0.03 rows=1 width=4) (actual time=0.018..0.018 rows=1 loops=1)
                         Group Key: cte.x
                         ->  CTE Scan on cte  (cost=0.00..0.02 rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=1)
                   ->  Index Only Scan using idx_temp_table on temp_table  (cost=0.42..13.14 rows=20 width=37) (actual time=0.058..0.068 rows=20 loops=1)
                         Index Cond: (n = cte.x)
                         Heap Fetches: 20
     Planning Time: 1.328 ms
     Execution Time: 0.360 ms
    (15 rows)
    
    +++++ TEN Values
                                                                          QUERY PLAN                                                                       
    -------------------------------------------------------------------------------------------------------------------------------------------------------
     Limit  (cost=0.91..89.11 rows=5 width=37) (actual time=0.179..0.183 rows=5 loops=1)
       CTE cte
         ->  Values Scan on "*VALUES*"  (cost=0.00..0.12 rows=10 width=4) (actual time=0.001..0.007 rows=10 loops=1)
       ->  Merge Semi Join  (cost=0.78..3528.72 rows=200 width=37) (actual time=0.178..0.181 rows=5 loops=1)
             Merge Cond: (temp_table.n = cte.x)
             ->  Index Only Scan using idx_temp_table on temp_table  (cost=0.42..3276.30 rows=100000 width=37) (actual time=0.030..0.123 rows=204 loops=1)
                   Heap Fetches: 204
             ->  Sort  (cost=0.37..0.39 rows=10 width=4) (actual time=0.023..0.023 rows=1 loops=1)
                   Sort Key: cte.x
                   Sort Method: quicksort  Memory: 25kB
                   ->  CTE Scan on cte  (cost=0.00..0.20 rows=10 width=4) (actual time=0.003..0.013 rows=10 loops=1)
     Planning Time: 0.197 ms
     Execution Time: 0.226 ms
    (13 rows)
    ++++++++ one row from table
                                                                           QUERY PLAN                                                                       
    --------------------------------------------------------------------------------------------------------------------------------------------------------
     Limit  (cost=14.39..58.52 rows=5 width=37) (actual time=0.168..0.173 rows=5 loops=1)
       CTE cte
         ->  Index Only Scan using idx_temp_table on temp_table temp_table_1  (cost=0.42..13.14 rows=20 width=4) (actual time=0.010..0.020 rows=20 loops=1)
               Index Cond: (n = 10)
               Heap Fetches: 20
       ->  Merge Semi Join  (cost=1.25..3531.24 rows=400 width=37) (actual time=0.167..0.170 rows=5 loops=1)
             Merge Cond: (temp_table.n = cte.x)
             ->  Index Only Scan using idx_temp_table on temp_table  (cost=0.42..3276.30 rows=100000 width=37) (actual time=0.025..0.101 rows=204 loops=1)
                   Heap Fetches: 204
             ->  Sort  (cost=0.83..0.88 rows=20 width=4) (actual time=0.039..0.039 rows=1 loops=1)
                   Sort Key: cte.x
                   Sort Method: quicksort  Memory: 25kB
                   ->  CTE Scan on cte  (cost=0.00..0.40 rows=20 width=4) (actual time=0.012..0.031 rows=20 loops=1)
     Planning Time: 0.243 ms
     Execution Time: 0.211 ms
    (15 rows)
    
    ++++++++ one row from table TWO ctes
                                                                              QUERY PLAN                                                                          
    --------------------------------------------------------------------------------------------------------------------------------------------------------------
     Limit  (cost=14.63..58.76 rows=5 width=37) (actual time=0.224..0.229 rows=5 loops=1)
       CTE val
         ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
       CTE cte
         ->  Nested Loop  (cost=0.44..13.37 rows=20 width=4) (actual time=0.038..0.052 rows=20 loops=1)
               ->  HashAggregate  (cost=0.02..0.03 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=1)
                     Group Key: val.x
                     ->  CTE Scan on val  (cost=0.00..0.02 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=1)
               ->  Index Only Scan using idx_temp_table on temp_table temp_table_1  (cost=0.42..13.14 rows=20 width=4) (actual time=0.029..0.038 rows=20 loops=1)
                     Index Cond: (n = val.x)
                     Heap Fetches: 20
       ->  Merge Semi Join  (cost=1.25..3531.24 rows=400 width=37) (actual time=0.223..0.226 rows=5 loops=1)
             Merge Cond: (temp_table.n = cte.x)
             ->  Index Only Scan using idx_temp_table on temp_table  (cost=0.42..3276.30 rows=100000 width=37) (actual time=0.038..0.114 rows=204 loops=1)
                   Heap Fetches: 204
             ->  Sort  (cost=0.83..0.88 rows=20 width=4) (actual time=0.082..0.082 rows=1 loops=1)
                   Sort Key: cte.x
                   Sort Method: quicksort  Memory: 25kB
                   ->  CTE Scan on cte  (cost=0.00..0.40 rows=20 width=4) (actual time=0.040..0.062 rows=20 loops=1)
     Planning Time: 0.362 ms
     Execution Time: 0.313 ms
    (21 rows)
    


    当心CTE!

    对于计划者来说,CTE或多或少是黑匣子,而对预期的行数,统计信息分布或内部排序知之甚少.


    Beware of CTEs!.

    For the planner, CTEs are more or less black boxes, and very little is known about expected number of rows, statistics distribution, or ordering inside.

    在CTE导致计划不正确的情况下(最初的问题不是这种情况),CTE通常可以由(临时)视图替换,计划者在视图中看到它全裸的荣耀.

    In cases where CTEs result in a bad plan (the original question is not such a case), a CTE can often be replaced by a (temp) view, which is seen by the planner in its full naked glory.

    这篇关于计划者不使用索引顺序使用CTE对记录进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆