带有“ order by”的PostgreSQL CURSOR条款 [英] PostgreSQL CURSOR with "order by" clause

查看:98
本文介绍了带有“ order by”的PostgreSQL CURSOR条款的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设有一个名为 A 的查询,该查询需要2秒钟的时间。

  SELECT ... FROM ... ORDER BY users_device。 id#查询A 
#它包含连接子句。
#需要2秒

但是,当我运行 A 并声明 CURSOR ,需要8秒。

  DECLARE cursor没有滚动光标,可以选择... ... ... ...由 users_device进行排序。 id 
#需要8秒

我试图比较它们之间的查询计划,然后发现 A CURSOR 似乎是在尝试避免排序操作。



以下是实际的查询计划。

 #A不带光标
+ ------------------------------ -------------------------------------------------- -------------------------------------------------- ----------------------------
|查询计划
| ------------------------------------------- -------------------------------------------------- -------------------------------------------------- ---------------
|排序(cost = 433944.70..434050.91行= 42485宽度= 147)(实际时间= 2664.192..2669.168行= 35949循环= 1)
|排序键:users_device.id DESC
|排序方法:外部合并磁盘:5536kB
| ->嵌套循环(成本= 239036.59..427483.24行= 42485宽度= 147)(实际时间= 1956.219..2631.077行= 35949循环= 1)
| ->嵌套循环(成本= 239036.16..404723.17行= 43069宽度= 151)(实际时间= 1956.209..2502.529行= 39556循环= 1)
| ->哈希联接(成本= 239035.73..367340.95行= 51402宽度= 12)(实际时间= 1956.192..2249.085行= 63844循环= 1)
|哈希值:(users_serviceuser_favorites.from_serviceuser_id = users_serviceuser.id)
| -> users_serviceuser_favorites上的位图堆扫描(成本= 1988.30..119110.71行= 72756宽度= 4)(实际时间= 22.182..74.569行= 66736 lo
|重新检查条件:(to_serviceuser_id = 773433)
|堆块:确切= 43597
|-> users_serviceuser_favorites_011e5c87上的位图索引扫描(成本= 0.00..1970.11行= 72756宽度= 0)(实际时间= 13.108..13.108 ro
|索引条件:( to_serviceuser_id = 773433)
|->哈希(cost = 196162.09..196162.09 rows = 2492028 width = 8)(实际时间= 1932.025..1932.025行= 2503575循环= 1)
|存储桶:131072批次:64内存使用量:2564kB
|-> users_serviceuser上的Seq扫描(成本= 0.00..196162.09行= 2492028宽度= 8)(实际时间= 0.184..1517.721行= 2503575循环= 1)
|过滤器:(状态= 1)
|通过过滤器删除的行:1016611
| ->使用users_device_e8701ad4在users_device上进行索引扫描(成本= 0.43..0.72行= 1宽度= 151)(实际时间= 0.003..0.004行= 1循环= 63844)索引条件:(user_id = users_serviceuser.id)
|过滤器:(状态= 0)
|筛选器删除的行:1
| ->在users_pushsetting上使用users_pushsetting_pkey进行索引扫描(成本= 0.43..0.52行= 1宽度= 4)(实际时间= 0.003..0.003行= 1循环= 39556)索引条件:(id = users_serviceuser.push_settings_id)
|过滤条件:live
|筛选器删除的行:0
|计划时间:2.537毫秒
|执行时间:2671.895 ms
+ ---------------------------------------- -------------------------------------------------- -------------------------------------------------- ------------------

#A与CURSOR(您可以看到没有其他排序操作
+ ----- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- ---
|查询计划
| ----------------------------------- -------------------------------------------------- -------------------------------------------------- -----------------------
|嵌套循环(成本= 1.85..3204962.54行= 42484宽度= 147)(实际时间= 0.324。 .8345.880行= 35945循环= 1)
|->嵌套循环(成本= 1.42..3182203.00行= 43068宽度= 151)(实际时间= 0.314..8206.619行= 39552循环= 1)
|->嵌套循环(成本= 0.99..3124567.60行= 84403宽度= 155)(实际时间= 0.302..8035.916行= 43584循环= 1) ->使用users_device上的users_device_pkey向后扫描索引(成本= 0.43..423536.24行= 2955421宽度= 151)(实际时间= 0.017..2659.130行
|过滤器:(状态= 0)
|行删除者筛选器:692112
|->仅索引在users_serviceuser_favorites上使用users_serviceuser_favorites_from_serviceuser_id_ac0a7b1d_uniq(cost = 0.56..0.89行= 2宽度
|索引条件:((from_serviceuser_id = users_device.user433_id)和(to_service ))
|堆访存:38956
|->在users_serviceuser上使用users_serviceuser_pkey进行索引扫描(成本= 0.43..0.67行= 1宽度= 8)(实际时间= 0.003..0.004行= 1) loops = 43584)
|索引条件:(id = users_device.user_id)
|过滤器:(status = 1)
|被过滤器删除的行:0
|->在users_pushsetting上使用users_pushsetting_pkey进行索引扫描(成本= 0.43..0.52行= 1宽度= 4)(实际时间= 0.003..0.003行= 1循环= 39552)
|索引条件:(id = users_serviceuser.push_settings_id)
|过滤条件:live
|筛选器删除的行:0
|计划时间:4.056毫秒
|执行时间:8348.095 ms
+ ---------------------------------------- -------------------------------------------------- -------------------------------------------------- ------------------

在声明 CURSOR ,Optimizer选择 users_device。 id(users_device_pkey)索引扫描,以便对所有联接表进行排序 users_device。 id 无需其他排序操作。结果,即使此选择导致查询性能较差,以后也不需要通过 users_device。 id 进行排序。



为什么优化器通过 CURSOR 选择其他计划?



它避免吗排序?



如果是这样,为什么?

解决方案

我找不到这在文档中,但我推测,当您使用游标时,数据库看起来更多的是估计的启动成本(到达输出第一行的时间),而不是估计的总成本(到达输出最后一行的时间)。 / p>

在您的示例中,估计慢速计划以1.85个成本单位输出第一行,而快速计划以433944.70个成本单位输出。因此,当您使用游标以便能够尽快提供部分结果时,数据库似乎更喜欢慢速计划。



这似乎是合理的-您改为使用游标普通查询,可能是因为您希望尽快开始处理数据。



我认为您可以使其快速运行,并且仍然可以使用通过显式创建临时表来获取:

 创建临时表t为select ... / *跳过* /; 
声明c光标并保持按住状态,从id的t顺序中选择*;






@mastaBlasta 在评论中指出,有一个选项可控制对于游标,第一个结果优先于整个结果的优先级: cursor_tuple_fraction


Let's suppose there is a Query called A and it takes 2sec.

SELECT ... FROM ... ORDER BY "users_device"."id"  # Query A
# It contains join clause.
# It takes 2sec

However, When I run A with declaring CURSOR, it takes 8sec.

DECLARE "cursor" NO SCROLL CURSOR WITH HOLD FOR SELECT ... FROM ... ORDER BY "users_device"."id"
# It takes 8sec

I have tried to compare Query Plan between them and then I found A with CURSOR seems to try to avoid sorting operation.

The below is actual query plan.

# A without CURSOR
+--------------------------------------------------------------------------------------------------------------------------------------------------------------
| QUERY PLAN
|--------------------------------------------------------------------------------------------------------------------------------------------------------------
| Sort  (cost=433944.70..434050.91 rows=42485 width=147) (actual time=2664.192..2669.168 rows=35949 loops=1)
|   Sort Key: users_device.id DESC
|   Sort Method: external merge  Disk: 5536kB
|   ->  Nested Loop  (cost=239036.59..427483.24 rows=42485 width=147) (actual time=1956.219..2631.077 rows=35949 loops=1)
|         ->  Nested Loop  (cost=239036.16..404723.17 rows=43069 width=151) (actual time=1956.209..2502.529 rows=39556 loops=1)
|               ->  Hash Join  (cost=239035.73..367340.95 rows=51402 width=12) (actual time=1956.192..2249.085 rows=63844 loops=1)
|                     Hash Cond: (users_serviceuser_favorites.from_serviceuser_id = users_serviceuser.id)
|                     ->  Bitmap Heap Scan on users_serviceuser_favorites  (cost=1988.30..119110.71 rows=72756 width=4) (actual time=22.182..74.569 rows=66736 lo
|                           Recheck Cond: (to_serviceuser_id = 773433)
|                           Heap Blocks: exact=43597
|                           ->  Bitmap Index Scan on users_serviceuser_favorites_011e5c87  (cost=0.00..1970.11 rows=72756 width=0) (actual time=13.108..13.108 ro
|                                 Index Cond: (to_serviceuser_id = 773433)
|                     ->  Hash  (cost=196162.09..196162.09 rows=2492028 width=8) (actual time=1932.025..1932.025 rows=2503575 loops=1)
|                           Buckets: 131072  Batches: 64  Memory Usage: 2564kB
|                           ->  Seq Scan on users_serviceuser  (cost=0.00..196162.09 rows=2492028 width=8) (actual time=0.184..1517.721 rows=2503575 loops=1)
|                                 Filter: (status = 1)
|                                 Rows Removed by Filter: 1016611
|               ->  Index Scan using users_device_e8701ad4 on users_device  (cost=0.43..0.72 rows=1 width=151) (actual time=0.003..0.004 rows=1 loops=63844)
|                     Index Cond: (user_id = users_serviceuser.id)
|                     Filter: (status = 0)
|                     Rows Removed by Filter: 1
|         ->  Index Scan using users_pushsetting_pkey on users_pushsetting  (cost=0.43..0.52 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=39556)
|               Index Cond: (id = users_serviceuser.push_settings_id)
|               Filter: live
|               Rows Removed by Filter: 0
| Planning time: 2.537 ms
| Execution time: 2671.895 ms
+--------------------------------------------------------------------------------------------------------------------------------------------------------------

# A with CURSOR (You can see there is no another sorting operation
+--------------------------------------------------------------------------------------------------------------------------------------------------------------
| QUERY PLAN
|--------------------------------------------------------------------------------------------------------------------------------------------------------------
| Nested Loop  (cost=1.85..3204962.54 rows=42484 width=147) (actual time=0.324..8345.880 rows=35945 loops=1)
|   ->  Nested Loop  (cost=1.42..3182203.00 rows=43068 width=151) (actual time=0.314..8206.619 rows=39552 loops=1)
|         ->  Nested Loop  (cost=0.99..3124567.60 rows=84403 width=155) (actual time=0.302..8035.916 rows=43584 loops=1)
|               ->  Index Scan Backward using users_device_pkey on users_device  (cost=0.43..423536.24 rows=2955421 width=151) (actual time=0.017..2659.130 row
|                     Filter: (status = 0)
|                     Rows Removed by Filter: 692112
|               ->  Index Only Scan using users_serviceuser_favorites_from_serviceuser_id_ac0a7b1d_uniq on users_serviceuser_favorites  (cost=0.56..0.89 rows=2 width
|                     Index Cond: ((from_serviceuser_id = users_device.user_id) AND (to_serviceuser_id = 773433))
|                     Heap Fetches: 38956
|         ->  Index Scan using users_serviceuser_pkey on users_serviceuser  (cost=0.43..0.67 rows=1 width=8) (actual time=0.003..0.004 rows=1 loops=43584)
|               Index Cond: (id = users_device.user_id)
|               Filter: (status = 1)
|               Rows Removed by Filter: 0
|   ->  Index Scan using users_pushsetting_pkey on users_pushsetting  (cost=0.43..0.52 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=39552)
|         Index Cond: (id = users_serviceuser.push_settings_id)
|         Filter: live
|         Rows Removed by Filter: 0
| Planning time: 4.056 ms
| Execution time: 8348.095 ms
+--------------------------------------------------------------------------------------------------------------------------------------------------------------

When declaring CURSOR, Optimizer chooses "users_device"."id"(users_device_pkey) index scan first so that all joined table is sorted by "users_device"."id" without another sorting operation. As the result, it doesn't need to do order by "users_device"."id" later even though this choice results in poor Query performance.

Why does optimizer choose a different plan by CURSOR?

Does it avoid sorting?

if so, why?

解决方案

I can't find this in documentation but I'd speculate, that when you use cursor the database looks more at estimated start-up cost (the time to the first row of output) than estimated total cost (the time to the last row of output).

In your example the slow plan is estimated to output the first row in 1.85 cost-units, and the fast plan in 433944.70 cost-units. So it looks like the database prefers the slow plan when you used cursor to be able to provide partial results as soon as possible.

This seems reasonable - you used a cursor instead of an ordinary query, probably because you prefer to start working on your data as soon as possible.

I think you can make it work fast and still retrieve data in chunks with fetch by explicitly creating the temporary table:

create temporary table t as select ... /* skip order by */;
declare c cursor with hold for select * from t order by id;


As @mastaBlasta pointed out in a comment there's an option that controls for how much is first result preferred over a whole result for cursors: cursor_tuple_fraction.

这篇关于带有“ order by”的PostgreSQL CURSOR条款的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆