为什么查询计划器无法转换相关子查询? [英] Why is the query planner unable to transform a correlated subquery?

查看:53
本文介绍了为什么查询计划器无法转换相关子查询?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

PostgreSQL 如何处理 1 + n 查询?,我了解到相关子查询可以改写为左连接:

In How does PostgreSQL approach a 1 + n query?, I learned that a correlated subquery can be rewritten as a left join:

select   film_id, title,
         (
           select     array_agg(first_name)
           from       actor
           inner join film_actor using(actor_id)
           where      film_actor.film_id = film.film_id
         ) as actors
from     film
order by title;

select   f.film_id, f.title, array_agg(a.first_name)
from     film f
   left join film_actor fa using(film_id)
   left join actor      a  using(actor_id)
group by f.film_id
order by f.title;

Bot 查询返回相同的结果,但第二个查询的性能更好.

Bot queries return the same results, but the second query performs better.

这让我想知道:为什么查询规划器不能自己做这样的转换?

This makes me wonder: why is the query planner unable to do such transformations by itself?

我明白为什么不是所有相关的子查询都可以转换为连接,但我没有看到这个特定查询有任何问题.

I can see why not all correlated subqueries could be transformed to a join, but I don't see any issues with this particular query.

更新性能

我尝试将性能比较如下.我执行了第一个查询 100 次的 2 个连续循环,然后是第二个查询的 2 个连续循环 100 次.在这两种情况下,我都忽略了第一个循环,因为我认为这是一个热身循环.

I tried to compare the performance as following. I executed 2 consecutive loops of 100 times the first query, followed by 2 consecutive loops of 100 times the second query. I ignored the first loop in both cases, as I considered that a warm-up loop.

第一个查询 100 倍需要 16 秒,第二个查询 100 倍需要 11 秒.

I get 16 seconds for 100x the first query and 11 seconds for 100x the second query.

说明如下:

相关子查询:

 Index Scan using idx_title on film  (cost=0.28..24949.50 rows=1000 width=51) (actual time=0.690..74.828 rows=1000 loops=1)
   SubPlan 1
     ->  Aggregate  (cost=24.84..24.85 rows=1 width=32) (actual time=0.068..0.068 rows=1 loops=1000)
       ->  Hash Join  (cost=10.82..24.82 rows=5 width=6) (actual time=0.034..0.055 rows=5 loops=1000)
         Hash Cond: (film_actor.actor_id = actor.actor_id)
         ->  Bitmap Heap Scan on film_actor  (cost=4.32..18.26 rows=5 width=2) (actual time=0.025..0.040 rows=5 loops=1000)
               Recheck Cond: (film_id = film.film_id)
               Heap Blocks: exact=5075
               ->  Bitmap Index Scan on idx_fk_film_id  (cost=0.00..4.32 rows=5 width=0) (actual time=0.015..0.015 rows=5 loops=1000)
                 Index Cond: (film_id = film.film_id)
         ->  Hash  (cost=4.00..4.00 rows=200 width=10) (actual time=0.338..0.338 rows=200 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 17kB
               ->  Seq Scan on actor  (cost=0.00..4.00 rows=200 width=10) (actual time=0.021..0.133 rows=200 loops=1)
 Planning time: 1.277 ms
 Execution time: 75.525 ms

加入:

 Sort  (cost=748.60..751.10 rows=1000 width=51) (actual time=35.865..36.060 rows=1000 loops=1)
   Sort Key: f.title
   Sort Method: quicksort  Memory: 199kB
   ->  GroupAggregate  (cost=645.31..698.78 rows=1000 width=51) (actual time=23.953..34.204 rows=1000 loops=1)
     Group Key: f.film_id
     ->  Sort  (cost=645.31..658.97 rows=5462 width=25) (actual time=23.910..25.210 rows=5465 loops=1)
           Sort Key: f.film_id
           Sort Method: quicksort  Memory: 619kB
           ->  Hash Left Join  (cost=84.00..306.25 rows=5462 width=25) (actual time=2.098..16.237 rows=5465 loops=1)
             Hash Cond: (fa.actor_id = a.actor_id)
             ->  Hash Right Join  (cost=77.50..231.03 rows=5462 width=21) (actual time=1.786..10.636 rows=5465 loops=1)
               Hash Cond: (fa.film_id = f.film_id)
               ->  Seq Scan on film_actor fa  (cost=0.00..84.62 rows=5462 width=4) (actual time=0.018..2.221 rows=5462 loops=1)
               ->  Hash  (cost=65.00..65.00 rows=1000 width=19) (actual time=1.753..1.753 rows=1000 loops=1)
                 Buckets: 1024  Batches: 1  Memory Usage: 59kB
                 ->  Seq Scan on film f  (cost=0.00..65.00 rows=1000 width=19) (actual time=0.029..0.819 rows=1000 loops=1)
             ->  Hash  (cost=4.00..4.00 rows=200 width=10) (actual time=0.286..0.286 rows=200 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 17kB
               ->  Seq Scan on actor a  (cost=0.00..4.00 rows=200 width=10) (actual time=0.016..0.114 rows=200 loops=1)
 Planning time: 1.648 ms
 Execution time: 36.599 ms

推荐答案

评论太多了.

你的相关子查询的重写应该是这样的:

The rewrite of your Correlated Subquery should be like this:

select film_id, title, a.actors
from   film
left join
  (         
           select     film_actor.film_id, array_agg(first_name) as actors
           from       actor
           inner join film_actor using(actor_id)
           group by   film_actor.film_id
  ) as a
on a.film_id = film.film_id
order by title;

关于性能,标量相关子查询对于优化器来说似乎很难,我不希望它们的性能与手动重写相同或更好.

Regarding performance, Scalar Correlated Subqueries simply seem to be hard for optimizers, I wouldn't expect them to perform the same or better than a manual rewrite.

这篇关于为什么查询计划器无法转换相关子查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆