Postgres:为什么在具有Offset / Limit的子选择上性能如此差 [英] Postgres: Why is the performance so bad on subselects with Offset/Limit

查看:107
本文介绍了Postgres:为什么在具有Offset / Limit的子选择上性能如此差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您能帮我理解这些语句之间的性能下降的原因吗?

Can you please help me understand the reason for the performance drop between these statements?

对我来说,对于D& E他首先将地址加入所有订户,最后应用Offset& Offset。限制。

For me it seems like in case of D & E he is first joining the address to all subscribers and at the end applies Offset & Limit. Why on earth would he do that?

我错过了有关Subselect和Offset一起工作的一些信息吗?他不是应该先找到正确的偏移量然后开始执行子选择吗?

Am I missing something about how Subselects and Offset work together? Shouldn't he first find the right offset and then start executing the subselects?

user_id address_id 是主键

user_id and address_id are primary keys

选择A:15毫秒(确定):选择前200个订阅者

Select A: 15 ms (OK): select first 200 subscribers

SELECT s.user_id
FROM subscribers s
ORDER BY s.user_id
OFFSET 0 LIMIT 200

选择B:45 ms(确定):选择最后200个订户

Select B: 45 ms (OK): Select last 200 subscribers

SELECT s.user_id
FROM subscribers s
ORDER BY s.user_id
OFFSET 100000 LIMIT 200

选择C:15 ms(确定):首先选择200个订阅者以及第一个可用地址

Select C: 15 ms (OK): Select first 200 subscribers together with first available address

SELECT s.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
FROM subscribers s
ORDER BY s.user_id
OFFSET 0 LIMIT 200

选择D:500毫秒(不正确):选择最后200个订户以及第一个可用地址

Select D: 500 ms (Not OK): Select last 200 subscribers together with first available address

SELECT s.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
FROM subscribers s
ORDER BY s.user_id
OFFSET 100000 LIMIT 200

选择E:1000毫秒(甚至更糟):选择最后200个订户以及前2个可用地址

Select E: 1000 ms (Even worse): Select last 200 subscribers together with first 2 available addresses

SELECT s.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id_1,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 1 LIMIT 2) AS a_id_2
FROM subscribers s
ORDER BY s.user_id
OFFSET 100000 LIMIT 200

选择F:15毫秒(很好):选择最后200个订阅者以及前两个用户标签地址没有偏移量,但使用s.user_id> 100385代替

Select F: 15 ms (Nice): Select last 200 subscribers together with first 2 available addresses without offset but WHERE s.user_id > 100385 instead

SELECT s.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id_1,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 1 LIMIT 2) AS a_id_2
FROM subscribers s
WHERE s.user_id > 100385 --same as OFFSET 100000 in my data
ORDER BY s.user_id
LIMIT 200

E的执行计划:

'Limit  (cost=1677635.30..1677635.80 rows=200 width=4) (actual time=2251.503..2251.816 rows=200 loops=1)'
'  Output: s.user_id, ((SubPlan 1)), ((SubPlan 2))'
'  Buffers: shared hit=607074'
'  ->  Sort  (cost=1677385.30..1677636.08 rows=100312 width=4) (actual time=2146.867..2200.704 rows=100200 loops=1)'
'        Output: s.user_id, ((SubPlan 1)), ((SubPlan 2))'
'        Sort Key: s.user_id'
'        Sort Method:  quicksort  Memory: 7775kB'
'        Buffers: shared hit=607074'
'        ->  Seq Scan on public.pcv_subscriber s  (cost=0.00..1669052.31 rows=100312 width=4) (actual time=0.040..2046.926 rows=100312 loops=1)'
'              Output: s.user_id, (SubPlan 1), (SubPlan 2)'
'              Buffers: shared hit=607074'
'              SubPlan 1'
'                ->  Limit  (cost=8.29..8.29 rows=1 width=4) (actual time=0.008..0.008 rows=1 loops=100312)'
'                      Output: ua.user_address_id'
'                      Buffers: shared hit=301458'
'                      ->  Sort  (cost=8.29..8.29 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=100312)'
'                            Output: ua.user_address_id'
'                            Sort Key: ua.user_address_id'
'                            Sort Method:  quicksort  Memory: 25kB'
'                            Buffers: shared hit=301458'
'                            ->  Index Scan using ix_pcv_user_address_user_id on public.pcv_user_address ua  (cost=0.00..8.28 rows=1 width=4) (actual time=0.003..0.004 rows=1 loops=100312)'
'                                  Output: ua.user_address_id'
'                                  Index Cond: (ua.user_id = $0)'
'                                  Buffers: shared hit=301458'
'              SubPlan 2'
'                ->  Limit  (cost=8.29..8.29 rows=1 width=4) (actual time=0.009..0.009 rows=0 loops=100312)'
'                      Output: ua.user_address_id'
'                      Buffers: shared hit=301458'
'                      ->  Sort  (cost=8.29..8.29 rows=1 width=4) (actual time=0.006..0.007 rows=1 loops=100312)'
'                            Output: ua.user_address_id'
'                            Sort Key: ua.user_address_id'
'                            Sort Method:  quicksort  Memory: 25kB'
'                            Buffers: shared hit=301458'
'                            ->  Index Scan using ix_pcv_user_address_user_id on public.pcv_user_address ua  (cost=0.00..8.28 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=100312)'
'                                  Output: ua.user_address_id'
'                                  Index Cond: (ua.user_id = $0)'
'                                  Buffers: shared hit=301458'
'Total runtime: 2251.968 ms'

免责声明:
这是一个简化的示例,其中包含一个更大且更复杂的语句,该语句启用了GUI表在多个表中对订阅者进行分类/分页/筛选,并附带大量其他累积数据。因此,我知道可以以更好的方式完成此示例。因此,相反,请帮助我理解为什么该解决方案如此之慢或充其量只能提出最小的更改。

Disclaimer: This is a stripped down example of a much larger and more complex statement that enables a GUI Table to sort/page/filter a subscriber with a lot of additional accumulated data across several tables. So I know this example can be done in better ways. So instead please help me understand why this solution is so slow or at best suggest minimal changes.

更新1:

这是使用Postgres 9.0.3制作的

This was produced using Postgres 9.0.3

更新2:

目前我能想出的最好的解决方案似乎是这个愚蠢的声明:

Currently the best solution to my problem I can come up with seems to be this stupid statement:

选择G:73ms(确定)

SELECT s.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id_1,
(SELECT address_id FROM address a WHERE a.user_id = s.user_id ORDER BY address_id OFFSET 1 LIMIT 2) AS a_id_2
FROM subscribers s
WHERE s.user_id >= (SELECT user_id from subscribers ORDER BY user_id OFFSET 100000 LIMIT 1)
ORDER BY s.user_id
LIMIT 200

更新3:

距离David到目前为止最好的选择。 (与G相同的性能,但更直观)

Best select so far from David. (same performance as G but more intuitive)

选择H:73ms(确定)

SELECT s2.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s2.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
FROM (SELECT s.user_id
      FROM  subscribers s
      ORDER BY s.user_id
      OFFSET 100000 LIMIT 200) s2

H的执行计划:

这也是我对E的设想第一名。

This is how I imagined it to be for E as well in the first place.

推荐答案

我认为,即使对于不包括在最终数据集中的100000行,也要执行SELECT子句中表示的联接。

I think that the join expressed in the SELECT clause is being executed even for the 100000 rows you are not including in the final data set.

这个怎么样:

SELECT s2.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s2.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
FROM (select *
      from   subscribers s
      ORDER BY s.user_id
      OFFSET 100000 LIMIT 200) s2

如果失败,请尝试使用公用表表达式:

Failing that, try a common table expression:

With s2 as (
  select *
  from   subscribers s
  ORDER BY s.user_id
  OFFSET 100000 LIMIT 200)
SELECT s2.user_id,
(SELECT address_id FROM address a WHERE a.user_id = s2.user_id ORDER BY address_id OFFSET 0 LIMIT 1) AS a_id
FROM s2

这篇关于Postgres:为什么在具有Offset / Limit的子选择上性能如此差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆