保持PostgreSQL从有时选择一个坏的查询计划 [英] Keep PostgreSQL from sometimes choosing a bad query plan

查看:112
本文介绍了保持PostgreSQL从有时选择一个坏的查询计划的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个奇怪的问题PostgreSQL性能的查询,使用PostgreSQL 8.4.9。此查询是在3D体积内选择一组点,使用 LEFT OUTER JOIN 添加相关ID列,其中存在相关ID。 x 范围内的小更改可能导致PostgreSQL选择不同的查询计划,执行时间从0.01秒到50秒。这是相关查询:

  SELECT treenode.id AS id,
treenode.parent_id AS parentid,
(treenode.location).x AS x,
(treenode.location).y AS y,
(treenode.location).z AS z,
treenode.confidence AS confidence,
treenode.user_id AS user_id,
treenode.radius AS radius,
((treenode.location).z - 50)AS z_diff,
treenode_class_instance.class_instance_id AS skeleton_id
FROM treenode LEFT OUTER JOIN
(treenode_class_instance INNER JOIN
class_instance.class_instance_id
= class_instance.id
AND class_instance.class_id = 7828307)
ON(treenode_class_instance。 treenode_id = treenode.id
AND treenode_class_instance.relation_id = 7828321)
WHERE treenode.project_id = 4
AND(treenode.location).x> = 8000
AND(treenode。位置).x <=(8000 + 4736)
AND(treenode.location).y> = 22244
AND(treenode.location).y≤(22244 + 3248)
AND(treenode.location).z> = 0
AND(treenode.location).z< = 100
ORDER BY parentid DESC,id,z_diff
LIMIT 400;

该查询花费了将近一分钟,如果我添加 EXPLAIN 到该查询的前面,似乎使用以下查询计划:

 限制(cost = 56185.16 ..56185.17 rows = 1 width = 89)
- >排序(cost = 56185.16..56185.17 rows = 1 width = 89)
排序键:treenode.parent_id,treenode.id,(((treenode.location).z - 50 :: double precision))
- >嵌套循环左连接(cost = 6715.16..56185.15 rows = 1 width = 89)
连接过滤器:(treenode_class_instance.treenode_id = treenode.id)
- > Bitmap Heap Scan on treenode(cost = 148.55..184.16 rows = 1 width = 81)
重新检查条件:((位置).x> = 8000 ::双精度)AND((位置).x< ; = 12736 :: double precision)AND((location).z> = 0 :: double precision)AND((location).z <= 100 :: double precision))
Filter:位置).y> = 22244 ::双精度)AND((位置).y <= 25492 ::双精度)AND(project_id = 4))
-> BitmapAnd(cost = 148.55..148.55 rows = 9 width = 0)
- >位图索引在location_x_index上扫描(cost = 0.00..67.38 rows = 2700 width = 0)
索引条件:(((位置).x> = 8000 ::双精度)AND((位置)。 ; = 12736 :: double precision))
- >位图索引对location_z_index(cost = 0.00..80.91 rows = 3253 width = 0)扫描
索引条件:((位置).z> = 0 ::双精度)AND((位置).z ; = 100 :: double precision))
- > Hash Join(cost = 6566.61..53361.69 rows = 211144 width = 16)
哈希条件:(treenode_class_instance.class_instance_id = class_instance.id)
- > Seq Scan on treenode_class_instance(cost = 0.00..25323.79 rows = 969285 width = 16)
过滤器:(relation_id = 7828321)
- > Hash(cost = 5723.54..5723.54 rows = 51366 width = 8)
- > Seq在class_instance上扫描(cost = 0.00..5723.54 rows = 51366 width = 8)
过滤器:(class_id = 7828307)
(20行)
但是,如果我替换 x 中的 8000 c>范围条件与 10644 ,查询执行只需几分之一秒,并使用此查询计划:

  Limit(cost = 58378.94..58378.95 rows = 2 width = 89)
- > Sort(cost = 58378.94..58378.95 rows = 2 width = 89)
排序键:treenode.parent_id,treenode.id,(((treenode.location).z - 50 :: double precision))
- > Hash Left Join(cost = 57263.11..58378.93 rows = 2 width = 89)
Hash Cond:(treenode.id = treenode_class_instance.treenode_id)
- > Bitmap Heap Scan on treenode(cost = 231.12..313.44 rows = 2 width = 81)
重新检查条件:((位置).z> = 0 ::双精度)AND((位置).z ; = 100 :: double precision)AND((location).x> = 10644 :: double precision)AND((location).x <= 15380 :: double precision))
过滤器:位置).y> = 22244 ::双精度)AND((位置).y <= 25492 ::双精度)AND(project_id = 4))
-> BitmapAnd(cost = 231.12..231.12 rows = 21 width = 0)
- >位图索引对location_z_index(cost = 0.00..80.91 rows = 3253 width = 0)扫描
索引条件:((位置).z> = 0 ::双精度)AND((位置).z ; = 100 :: double precision))
- >位图索引在location_x_index上扫描(cost = 0.00..149.95 rows = 6157 width = 0)
索引条件:(((位置).x> = 10644 ::双精度)AND((位置)。 ; = 15380 :: double precision))
- >哈希(cost = 53361.69..53361.69 rows = 211144 width = 16)
- > Hash Join(cost = 6566.61..53361.69 rows = 211144 width = 16)
哈希条件:(treenode_class_instance.class_instance_id = class_instance.id)
- > Seq Scan on treenode_class_instance(cost = 0.00..25323.79 rows = 969285 width = 16)
过滤器:(relation_id = 7828321)
- > Hash(cost = 5723.54..5723.54 rows = 51366 width = 8)
- > Seq在class_instance上扫描(cost = 0.00..5723.54 rows = 51366 width = 8)
过滤器:(class_id = 7828307)
(21行)
pre>

我远不是解析这些查询计划的专家,但明显的区别似乎是一个 x range,它为 LEFT OUTER JOIN (这是非常快)使用 Hash Left Join 另一个范围它使用嵌套循环左连接(似乎很慢)。在这两种情况下,查询返回约90行。如果我在查询的慢版本之前做 SET ENABLE_NESTLOOP TO FALSE ,它会非常快,但我知道



例如,我可以创建一个特定的索引,它更可能的查询计划将选择明确更有效的策略?任何人都可以建议为什么PostgreSQL的查询计划器应该选择这样一个糟糕的策略为这些查询之一?






treenode表有900,000行,定义为

 表public.treenode
列|类型|修饰符
--------------- + -------------------------- + - -------------------------------------------------- -
id |大写| not null default nextval('concept_id_seq':: regclass)
user_id | bigint | not null
creation_time |带时区的时间戳| not null default now()
edition_time |带时区的时间戳| not null default now()
project_id | bigint | not null
location | double3d | not null
parent_id | bigint |
radius |双精度| not null default 0
confidence |整数| not null default 5
索引:
treenode_pkeyPRIMARY KEY,btree(id)
treenode_id_keyUNIQUE,btree(id)
location_x_indexbtree .x))
location_y_indexbtree((location).y))
location_z_indexbtree((location).z))
外键约束:
treenode_parent_id_fkeyFOREIGN KEY(parent_id)REFERENCES treenode(id)
引用者:
表treenode_class_instanceCONSTRAINTtreenode_class_instance_treenode_id_fkeyFOREIGN KEY(treenode_id)REFERENCES treenode(id)ON DELETE CASCADE
TABLEtreenodeCONSTRAINTtreenode_parent_id_fkeyFOREIGN KEY(parent_id)REFERENCES treenode(id)
触发器:
on_edit_treenode在每个ROW执行过程的treenode之前on_edit()
Inherits:location

double3d 复合类型定义如下: / p>

 复合类型public.double3d
键入
-------- + ------------------
x |双精度
y |双精度
z |双精度

连接中涉及的其他两个表是 treenode_class_instance

 表public.treenode_class_instance类型|修饰符
------------------- + ------------------------- - + ------------------------------------------------ ------
id | bigint | not null default nextval('concept_id_seq':: regclass)
user_id | bigint | not null
creation_time |带时区的时间戳| not null default now()
edition_time |时区的时间戳| not null default now()
project_id | bigint | not null
relation_id | bigint | not null
treenode_id |大写| not null
class_instance_id | bigint | not null
索引:
treenode_class_instance_pkeyPRIMARY KEY,btree(id)
treenode_class_instance_id_keyUNIQUE,btree(id)
idx_class_instance_idbtree(class_instance_id)
外键约束:
treenode_class_instance_class_instance_id_fkeyFOREIGN KEY(class_instance_id)REFERENCES class_instance(id)ON DELETE CASCADE
treenode_class_instance_relation_id_fkeyFOREIGN KEY(relation_id)REFERENCES relation(id)
treenode_class_instance_treenode_id_fkeyFOREIGN KEY(treenode_id)REFERENCES treenode(id)ON DELETE CASCADE
treenode_class_instance_user_id_fkeyFOREIGN KEY(user_id)REFERENCESuser(id)
触发器:
on_edit_treenode_class_instance BEFORE UPDATE ON treenode_class_instance FOR EACH ROW EXECUTE PROCEDURE on_edit()
Inherits:relation_instance

...和 class_instance

 表public.class_instance类型|修饰符
--------------- + -------------------------- + - -------------------------------------------------- -
id | bigint | not null default nextval('concept_id_seq':: regclass)
user_id | bigint | not null
creation_time |带时区的时间戳| not null default now()
edition_time |带时区的时间戳| not null default now()
project_id | bigint | not null
class_id | bigint | not null
name |字符变化(255)| not null
索引:
class_instance_pkeyPRIMARY KEY,btree(id)
class_instance_id_keyUNIQUE,btree(id)
外键约束:
class_instance_class_id_fkey FOREIGN KEY(class_id)REFERENCES class(id)
class_instance_user_id_fkeyFOREIGN KEY(user_id)REFERENCESuser(id)
引用者:
TABLEclass_instance_class_instanceCONSTRAINTclass_instance_class_instance_class_instance_a_fkeyFOREIGN KEY(class_instance_a)REFERENCES class_instance(id)ON DELETE CASCADE
TABLEclass_instance_class_instanceCONSTRAINTclass_instance_class_instance_class_instance_b_fkeyFOREIGN KEY(class_instance_b)REFERENCES class_instance(id)ON DELETE CASCADE
TABLEconnector_class_instanceCONSTRAINTconnector_class_instance_class_instance_id_fkeyFOREIGN KEY(class_instance_id)REFERENCES class_instance(id)
TABLEtreenode_class_instanceCONSTRAINTtreenode_class_instance_class_instance_id_fkeyFOREIGN KEY(class_instance_id)REFERENCES class_instance(id)ON DELETE CASCADE
触发器:
on_edit_class_instance BEFORE UPDATE ON class_instance FOR EACH ROW EXECUTE PROCEDURE on_edit()
Inherits:concept


解决方案

如果查询计划器做出错误决定,则主要是以下两种情况之一:




  • 统计信息



ANALYZE 够了吗?它也是流行的组合形式 VACUUM ANALYZE 。如果 autovacuum 处于启用状态(这是默认值现代Postgres), ANALYZE 会自动运行。但请考虑:





(前两个答案仍适用于Postgres 9.6)



如果您的表格是 big 分布不均匀,提高 default_statistics_target 可能有所帮助。或者,只需设置相关列的统计信息目标(基本上是 WHERE JOIN 子句中的那些):

  ALTER TABLE ... ALTER COLUMN ... SET STATISTICS 1234; - 校准数字




目标可以在0到10000;


之后再次运行 ANALYZE




  • 计划器估算的费用设置已关闭。



阅读本章计划成本常数



查看 default_statistics_target 和 random_page_cost < .org / wiki / Tuning_Your_PostgreSQL_Serverrel =nofollow noreferrer>一般有用的PostgreSQL Wiki页面



当然,还有很多其他原因,这些是最常见的。


I have a strange problem with PostgreSQL performance for a query, using PostgreSQL 8.4.9. This query is selecting a set of points within a 3D volume, using a LEFT OUTER JOIN to add a related ID column where that related ID exists. Small changes in the x range can cause PostgreSQL to choose a different query plan, which takes the execution time from 0.01 seconds to 50 seconds. This is the query in question:

SELECT treenode.id AS id,
       treenode.parent_id AS parentid,
       (treenode.location).x AS x,
       (treenode.location).y AS y,
       (treenode.location).z AS z,
       treenode.confidence AS confidence,
       treenode.user_id AS user_id,
       treenode.radius AS radius,
       ((treenode.location).z - 50) AS z_diff,
       treenode_class_instance.class_instance_id AS skeleton_id
  FROM treenode LEFT OUTER JOIN
         (treenode_class_instance INNER JOIN
          class_instance ON treenode_class_instance.class_instance_id
                                                  = class_instance.id
                            AND class_instance.class_id = 7828307)
       ON (treenode_class_instance.treenode_id = treenode.id
           AND treenode_class_instance.relation_id = 7828321)
  WHERE treenode.project_id = 4
    AND (treenode.location).x >= 8000
    AND (treenode.location).x <= (8000 + 4736)
    AND (treenode.location).y >= 22244
    AND (treenode.location).y <= (22244 + 3248)
    AND (treenode.location).z >= 0
    AND (treenode.location).z <= 100
  ORDER BY parentid DESC, id, z_diff
  LIMIT 400;

That query takes nearly a minute, and, if I add EXPLAIN to the front of that query, seems to be using the following query plan:

 Limit  (cost=56185.16..56185.17 rows=1 width=89)
   ->  Sort  (cost=56185.16..56185.17 rows=1 width=89)
         Sort Key: treenode.parent_id, treenode.id, (((treenode.location).z - 50::double precision))
         ->  Nested Loop Left Join  (cost=6715.16..56185.15 rows=1 width=89)
               Join Filter: (treenode_class_instance.treenode_id = treenode.id)
               ->  Bitmap Heap Scan on treenode  (cost=148.55..184.16 rows=1 width=81)
                     Recheck Cond: (((location).x >= 8000::double precision) AND ((location).x <= 12736::double precision) AND ((location).z >= 0::double precision) AND ((location).z <= 100::double precision))
                     Filter: (((location).y >= 22244::double precision) AND ((location).y <= 25492::double precision) AND (project_id = 4))
                     ->  BitmapAnd  (cost=148.55..148.55 rows=9 width=0)
                           ->  Bitmap Index Scan on location_x_index  (cost=0.00..67.38 rows=2700 width=0)
                                 Index Cond: (((location).x >= 8000::double precision) AND ((location).x <= 12736::double precision))
                           ->  Bitmap Index Scan on location_z_index  (cost=0.00..80.91 rows=3253 width=0)
                                 Index Cond: (((location).z >= 0::double precision) AND ((location).z <= 100::double precision))
               ->  Hash Join  (cost=6566.61..53361.69 rows=211144 width=16)
                     Hash Cond: (treenode_class_instance.class_instance_id = class_instance.id)
                     ->  Seq Scan on treenode_class_instance  (cost=0.00..25323.79 rows=969285 width=16)
                           Filter: (relation_id = 7828321)
                     ->  Hash  (cost=5723.54..5723.54 rows=51366 width=8)
                           ->  Seq Scan on class_instance  (cost=0.00..5723.54 rows=51366 width=8)
                                 Filter: (class_id = 7828307)
(20 rows)

However, if I replace the 8000 in the x range condition with 10644, the query is performed in a fraction of a second and uses this query plan:

 Limit  (cost=58378.94..58378.95 rows=2 width=89)
   ->  Sort  (cost=58378.94..58378.95 rows=2 width=89)
         Sort Key: treenode.parent_id, treenode.id, (((treenode.location).z - 50::double precision))
         ->  Hash Left Join  (cost=57263.11..58378.93 rows=2 width=89)
               Hash Cond: (treenode.id = treenode_class_instance.treenode_id)
               ->  Bitmap Heap Scan on treenode  (cost=231.12..313.44 rows=2 width=81)
                     Recheck Cond: (((location).z >= 0::double precision) AND ((location).z <= 100::double precision) AND ((location).x >= 10644::double precision) AND ((location).x <= 15380::double precision))
                     Filter: (((location).y >= 22244::double precision) AND ((location).y <= 25492::double precision) AND (project_id = 4))
                     ->  BitmapAnd  (cost=231.12..231.12 rows=21 width=0)
                           ->  Bitmap Index Scan on location_z_index  (cost=0.00..80.91 rows=3253 width=0)
                                 Index Cond: (((location).z >= 0::double precision) AND ((location).z <= 100::double precision))
                           ->  Bitmap Index Scan on location_x_index  (cost=0.00..149.95 rows=6157 width=0)
                                 Index Cond: (((location).x >= 10644::double precision) AND ((location).x <= 15380::double precision))
               ->  Hash  (cost=53361.69..53361.69 rows=211144 width=16)
                     ->  Hash Join  (cost=6566.61..53361.69 rows=211144 width=16)
                           Hash Cond: (treenode_class_instance.class_instance_id = class_instance.id)
                           ->  Seq Scan on treenode_class_instance  (cost=0.00..25323.79 rows=969285 width=16)
                                 Filter: (relation_id = 7828321)
                           ->  Hash  (cost=5723.54..5723.54 rows=51366 width=8)
                                 ->  Seq Scan on class_instance  (cost=0.00..5723.54 rows=51366 width=8)
                                       Filter: (class_id = 7828307)
(21 rows)

I'm far from an expert in parsing these query plans, but the clear difference seems to be that with one x range it uses a Hash Left Join for the LEFT OUTER JOIN (which is very fast), while with the other range it uses a Nested Loop Left Join (which seems to be very slow). In both cases the queries return about 90 rows. If I do SET ENABLE_NESTLOOP TO FALSE before the slow version of the query, it goes very fast, but I understand that using that setting in general is a bad idea.

Can I, for example, create a particular index in order to make it more likely that the query planner will choose the clearly more efficient strategy? Could anyone suggest why PostgreSQL's query planner should be choosing such a poor strategy for one of these queries? Below I have included details of the schema that may be helpful.


The treenode table has 900,000 rows, and is defined as follows:

                                     Table "public.treenode"
    Column     |           Type           |                      Modifiers                       
---------------+--------------------------+------------------------------------------------------
 id            | bigint                   | not null default nextval('concept_id_seq'::regclass)
 user_id       | bigint                   | not null
 creation_time | timestamp with time zone | not null default now()
 edition_time  | timestamp with time zone | not null default now()
 project_id    | bigint                   | not null
 location      | double3d                 | not null
 parent_id     | bigint                   | 
 radius        | double precision         | not null default 0
 confidence    | integer                  | not null default 5
Indexes:
    "treenode_pkey" PRIMARY KEY, btree (id)
    "treenode_id_key" UNIQUE, btree (id)
    "location_x_index" btree (((location).x))
    "location_y_index" btree (((location).y))
    "location_z_index" btree (((location).z))
Foreign-key constraints:
    "treenode_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES treenode(id)
Referenced by:
    TABLE "treenode_class_instance" CONSTRAINT "treenode_class_instance_treenode_id_fkey" FOREIGN KEY (treenode_id) REFERENCES treenode(id) ON DELETE CASCADE
    TABLE "treenode" CONSTRAINT "treenode_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES treenode(id)
Triggers:
    on_edit_treenode BEFORE UPDATE ON treenode FOR EACH ROW EXECUTE PROCEDURE on_edit()
Inherits: location

The double3d composite type is defined as follows:

Composite type "public.double3d"
 Column |       Type       
--------+------------------
 x      | double precision
 y      | double precision
 z      | double precision

The other two tables involved in the join are treenode_class_instance:

                               Table "public.treenode_class_instance"
      Column       |           Type           |                      Modifiers                       
-------------------+--------------------------+------------------------------------------------------
 id                | bigint                   | not null default nextval('concept_id_seq'::regclass)
 user_id           | bigint                   | not null
 creation_time     | timestamp with time zone | not null default now()
 edition_time      | timestamp with time zone | not null default now()
 project_id        | bigint                   | not null
 relation_id       | bigint                   | not null
 treenode_id       | bigint                   | not null
 class_instance_id | bigint                   | not null
Indexes:
    "treenode_class_instance_pkey" PRIMARY KEY, btree (id)
    "treenode_class_instance_id_key" UNIQUE, btree (id)
    "idx_class_instance_id" btree (class_instance_id)
Foreign-key constraints:
    "treenode_class_instance_class_instance_id_fkey" FOREIGN KEY (class_instance_id) REFERENCES class_instance(id) ON DELETE CASCADE
    "treenode_class_instance_relation_id_fkey" FOREIGN KEY (relation_id) REFERENCES relation(id)
    "treenode_class_instance_treenode_id_fkey" FOREIGN KEY (treenode_id) REFERENCES treenode(id) ON DELETE CASCADE
    "treenode_class_instance_user_id_fkey" FOREIGN KEY (user_id) REFERENCES "user"(id)
Triggers:
    on_edit_treenode_class_instance BEFORE UPDATE ON treenode_class_instance FOR EACH ROW EXECUTE PROCEDURE on_edit()
Inherits: relation_instance

... and class_instance:

                                  Table "public.class_instance"
    Column     |           Type           |                      Modifiers                       
---------------+--------------------------+------------------------------------------------------
 id            | bigint                   | not null default nextval('concept_id_seq'::regclass)
 user_id       | bigint                   | not null
 creation_time | timestamp with time zone | not null default now()
 edition_time  | timestamp with time zone | not null default now()
 project_id    | bigint                   | not null
 class_id      | bigint                   | not null
 name          | character varying(255)   | not null
Indexes:
    "class_instance_pkey" PRIMARY KEY, btree (id)
    "class_instance_id_key" UNIQUE, btree (id)
Foreign-key constraints:
    "class_instance_class_id_fkey" FOREIGN KEY (class_id) REFERENCES class(id)
    "class_instance_user_id_fkey" FOREIGN KEY (user_id) REFERENCES "user"(id)
Referenced by:
    TABLE "class_instance_class_instance" CONSTRAINT "class_instance_class_instance_class_instance_a_fkey" FOREIGN KEY (class_instance_a) REFERENCES class_instance(id) ON DELETE CASCADE
    TABLE "class_instance_class_instance" CONSTRAINT "class_instance_class_instance_class_instance_b_fkey" FOREIGN KEY (class_instance_b) REFERENCES class_instance(id) ON DELETE CASCADE
    TABLE "connector_class_instance" CONSTRAINT "connector_class_instance_class_instance_id_fkey" FOREIGN KEY (class_instance_id) REFERENCES class_instance(id)
    TABLE "treenode_class_instance" CONSTRAINT "treenode_class_instance_class_instance_id_fkey" FOREIGN KEY (class_instance_id) REFERENCES class_instance(id) ON DELETE CASCADE
Triggers:
    on_edit_class_instance BEFORE UPDATE ON class_instance FOR EACH ROW EXECUTE PROCEDURE on_edit()
Inherits: concept

解决方案

If the query planner makes bad decisions it's mostly one of two things:

  • The statistics are off. Meaning "inaccurate", not "turned off".

Do you run ANALYZE enough? Also popular in it's combined form VACUUM ANALYZE. If autovacuum is on (which is the default in modern-day Postgres), ANALYZE is run automatically. But consider:

(Top two answers still apply for Postgres 9.6.)

If your table is big and data distribution is uneven, raising the default_statistics_target may help. Or rather, just set the statistics target for relevant columns (those in WHERE or JOIN clauses of your queries, basically):

ALTER TABLE ... ALTER COLUMN ... SET STATISTICS 1234;  -- calibrate number

The target can be set in the range 0 to 10000;

Run ANALYZE again after that (on relevant tables).

  • The cost settings for planner estimates are off.

Read the chapter Planner Cost Constants in the manual.

Look at the chapters default_statistics_target and random_page_cost on this generally helpful PostgreSQL Wiki page.

Of course, there can be many other reasons, but these are the most common.

这篇关于保持PostgreSQL从有时选择一个坏的查询计划的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆