Postgres的CTE与子查询的性能差异.为什么? [英] Postgres' CTE vs Subquery Performance difference. Why?

查看:111
本文介绍了Postgres的CTE与子查询的性能差异.为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个等效的查询,它们提取特定区域(ace)和城市(pro_com)中的建筑物(表a)和最近的高速公路(表v中的高速公路)之间的平均距离.

I have two equivalent queries which extracts the average distance between buildings (table a) and the nearest highway (highways in table v) in a specific district (ace) and city (pro_com).

这是CTE版本

WITH subq AS (
SELECT a.n, a.geom as g1, unnest(ARRAY(SELECT v.geom as g2 
  FROM atlas_sezioni2 as v
  where v.code = '12230' and a.pro_com = v.pro_com and a.code <> v.code  
  ORDER BY a.geom <-> v.geom LIMIT 15)) as g2
FROM atlas_sezioni2 a
where a.pro_com = 15146 and a.ace = 1 and a.code IN('11100', '11210', '11220', '11230', '11240', '11300', '12100', '14200')  
)

select avg(dist) from (
select distinct on(n) n, dist
from (
SELECT n, ST_Distance_Sphere(g1, g2) as dist FROM subq
) disttable
order by n, dist asc
) final;

为了使用GIST索引,我在CTE中提取了15条最近的高速公路并计算了距离( http://workshops.boundlessgeo.com/postgis-intro/knn.html ). 而CTE的解释是:

where in the CTE I extract the 15 nearest highways and calculate the distance, in order to use GIST indexes (http://workshops.boundlessgeo.com/postgis-intro/knn.html). And the CTE's explain:

Aggregate  (cost=37342.10..37342.11 rows=1 width=8)
   CTE subq
     ->  Index Scan using atlas_sezioni2_code_ace_pro_com_n_idx on atlas_sezioni2 a  (cost=0.29..29987.90 rows=20900 width=236211)
       Index Cond: (((code)::text = ANY ('{11100,11210,11220,11230,11240,11300,12100,14200}'::text[])) AND (ace = 1) AND (pro_com = 15146::numeric))
       SubPlan 1
         ->  Limit  (cost=141.04..141.08 rows=15 width=236190)
               ->  Sort  (cost=141.04..141.21 rows=69 width=236190)
                     Sort Key: ((a.geom <-> v.geom))
                     ->  Index Scan using atlas_sezioni2_code_ace_pro_com_n_idx on atlas_sezioni2 v  (cost=0.28..139.35 rows=69 width=236190)
                           Index Cond: (((code)::text = '12230'::text) AND (a.pro_com = pro_com))
                           Filter: ((a.code)::text <> (code)::text)
   ->  Unique  (cost=7247.20..7351.70 rows=200 width=72)
     ->  Sort  (cost=7247.20..7299.45 rows=20900 width=72)
           Sort Key: subq.n, (_st_distance(geography(subq.g1), geography(subq.g2), 0::double precision, false))
           ->  CTE Scan on subq  (cost=0.00..5747.50 rows=20900 width=72)
(15 rows)

这与子查询等效:

select avg(dist) from (
select distinct on(n) n, dist
from (
SELECT n, ST_Distance_Sphere(g1, g2) as dist FROM (
SELECT a.n, a.geom as g1, unnest(ARRAY(SELECT v.geom as g2 
  FROM atlas_sezioni2 as v
  where v.code = '12230' and a.pro_com = v.pro_com and a.code <> v.code  
  ORDER BY a.geom <-> v.geom LIMIT 15)) as g2
FROM atlas_sezioni2 a
where a.pro_com = 15146 and a.ace = 1 and a.code IN('11100', '11210', '11220', '11230', '11240', '11300', '12100', '14200')  
) subq
) disttable
order by n, dist asc
) final

及其说明

Aggregate  (cost=6366298.35..6366298.36 rows=1 width=8)
   ->  Unique  (cost=6365932.60..6366037.10 rows=20900 width=236230)
     ->  Sort  (cost=6365932.60..6365984.85 rows=20900 width=236230)
           Sort Key: subq.n, (_st_distance(geography(subq.g1), geography(subq.g2), 0::double precision, false))
           ->  Subquery Scan on subq  (cost=0.29..35526.40 rows=20900 width=236230)
                 ->  Index Scan using atlas_sezioni2_code_ace_pro_com_n_idx on atlas_sezioni2 a  (cost=0.29..29987.90 rows=20900 width=236211)
                       Index Cond: (((code)::text = ANY ('{11100,11210,11220,11230,11240,11300,12100,14200}'::text[])) AND (ace = 1) AND (pro_com = 15146::numeric))
                       SubPlan 1
                         ->  Limit  (cost=141.04..141.08 rows=15 width=236190)
                               ->  Sort  (cost=141.04..141.21 rows=69 width=236190)
                                     Sort Key: ((a.geom <-> v.geom))
                                     ->  Index Scan using atlas_sezioni2_code_ace_pro_com_n_idx on atlas_sezioni2 v  (cost=0.28..139.35 rows=69 width=236190)
                                           Index Cond: (((code)::text = '12230'::text) AND (a.pro_com = pro_com))
                                           Filter: ((a.code)::text <> (code)::text)
(14 rows)

我知道CTE是优化的边界(Postgres不在CTE和它们外部的查询之间进行优化),但这很奇怪.为什么性能会以这种方式爆发?

I know CTEs are boundary fences for optimization (Postgres doesn't optmize between CTEs and queries that are outside them), but this is strange. Why is the performance blowing out in this way?

推荐答案

正如@CraigRinger所说,我也应该检查分析.实际上,从解释分析"中我们可以看到第一个是:

As @CraigRinger said, I should have checked also the analyze. In fact, from "explain analyze" we see that the first one is:

Aggregate  (cost=58406.66..58406.67 rows=1 width=8) (actual time=138191.294..138191.295 rows=1 loops=1)
 CTE subq
   ->  Bitmap Heap Scan on atlas_sezioni2 a  (cost=9.93..51052.46 rows=20900 width=236211) (actual time=2.814..308.667 rows=3705 loops=1)
         Recheck Cond: (ace = 1)
         Filter: ((pro_com = 15146::numeric) AND ((code)::text = ANY ('{11100,11210,11220,11230,11240,11300,12100,14200}'::text[])))
         Rows Removed by Filter: 4
         Heap Blocks: exact=42
         ->  Bitmap Index Scan on atlas_sezioni2_ace_idx  (cost=0.00..9.88 rows=251 width=0) (actual time=0.110..0.110 rows=251 loops=1)
               Index Cond: (ace = 1)
         SubPlan 1
           ->  Limit  (cost=240.70..240.74 rows=15 width=236190) (actual time=0.630..0.636 rows=15 loops=247)
                 ->  Sort  (cost=240.70..240.87 rows=69 width=236190) (actual time=0.627..0.630 rows=15 loops=247)
                       Sort Key: ((a.geom <-> v.geom))
                       Sort Method: top-N heapsort  Memory: 26kB
                       ->  Bitmap Heap Scan on atlas_sezioni2 v  (cost=4.56..239.01 rows=69 width=236190) (actual time=0.045..0.518 rows=73 loops=247)
                             Recheck Cond: ((code)::text = '12230'::text)
                             Filter: (((a.code)::text <> (code)::text) AND (a.pro_com = pro_com))
                             Heap Blocks: exact=6916
                             ->  Bitmap Index Scan on atlas_sezioni2_code_idx  (cost=0.00..4.55 rows=73 width=0) (actual time=0.030..0.030 rows=73 loops=247)
                                   Index Cond: ((code)::text = '12230'::text)
 ->  Unique  (cost=7247.20..7351.70 rows=200 width=72) (actual time=138190.527..138191.243 rows=247 loops=1)
       ->  Sort  (cost=7247.20..7299.45 rows=20900 width=72) (actual time=138190.526..138190.800 rows=3705 loops=1)
             Sort Key: subq.n, (_st_distance(geography(subq.g1), geography(subq.g2), 0::double precision, false))
             Sort Method: quicksort  Memory: 270kB
             ->  CTE Scan on subq  (cost=0.00..5747.50 rows=20900 width=72) (actual time=159.739..138182.891 rows=3705 loops=1)
 Planning time: 2.623 ms
 Execution time: 138217.574 ms
(27 rows)

而子查询之一是:

Aggregate  (cost=6387362.91..6387362.92 rows=1 width=8) (actual time=140208.005..140208.005 rows=1 loops=1)
 ->  Unique  (cost=6386997.16..6387101.66 rows=20900 width=236230) (actual time=140207.243..140207.947 rows=247 loops=1)
       ->  Sort  (cost=6386997.16..6387049.41 rows=20900 width=236230) (actual time=140207.241..140207.514 rows=3705 loops=1)
             Sort Key: subq.n, (_st_distance(geography(subq.g1), geography(subq.g2), 0::double precision, false))
             Sort Method: quicksort  Memory: 270kB
             ->  Subquery Scan on subq  (cost=9.93..56590.96 rows=20900 width=236230) (actual time=160.784..140199.364 rows=3705 loops=1)
                   ->  Bitmap Heap Scan on atlas_sezioni2 a  (cost=9.93..51052.46 rows=20900 width=236211) (actual time=2.384..308.517 rows=3705 loops=1)
                         Recheck Cond: (ace = 1)
                         Filter: ((pro_com = 15146::numeric) AND ((code)::text = ANY ('{11100,11210,11220,11230,11240,11300,12100,14200}'::text[])))
                         Rows Removed by Filter: 4
                         Heap Blocks: exact=42
                         ->  Bitmap Index Scan on atlas_sezioni2_ace_idx  (cost=0.00..9.88 rows=251 width=0) (actual time=0.150..0.150 rows=251 loops=1)
                               Index Cond: (ace = 1)
                         SubPlan 1
                           ->  Limit  (cost=240.70..240.74 rows=15 width=236190) (actual time=0.640..0.646 rows=15 loops=247)
                                 ->  Sort  (cost=240.70..240.87 rows=69 width=236190) (actual time=0.637..0.640 rows=15 loops=247)
                                       Sort Key: ((a.geom <-> v.geom))
                                       Sort Method: top-N heapsort  Memory: 26kB
                                       ->  Bitmap Heap Scan on atlas_sezioni2 v  (cost=4.56..239.01 rows=69 width=236190) (actual time=0.045..0.527 rows=73 loops=247)
                                             Recheck Cond: ((code)::text = '12230'::text)
                                             Filter: (((a.code)::text <> (code)::text) AND (a.pro_com = pro_com))
                                             Heap Blocks: exact=6916
                                             ->  Bitmap Index Scan on atlas_sezioni2_code_idx  (cost=0.00..4.55 rows=73 width=0) (actual time=0.031..0.031 rows=73 loops=247)
                                                   Index Cond: ((code)::text = '12230'::text)
 Planning time: 1.117 ms
 Execution time: 140208.187 ms

因此,仅在explain :)中,它的性能更好.实际性能不会改变.

So it does perform better only in the explain :). Real performance doesn't change.

这篇关于Postgres的CTE与子查询的性能差异.为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆