防止 PostgreSQL 有时选择错误的查询计划 [英] Keep PostgreSQL from sometimes choosing a bad query plan

查看:19
本文介绍了防止 PostgreSQL 有时选择错误的查询计划的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用 PostgreSQL 8.4.9 进行查询的 PostgreSQL 性能方面有一个奇怪的问题.此查询选择 3D 体积内的一组点,使用 LEFT OUTER JOIN 在存在相关 ID 的位置添加相关 ID 列.x 范围内的微小变化都会导致 PostgreSQL 选择不同的查询计划,执行时间从 0.01 秒到 50 秒不等.这是有问题的查询:

SELECT treenode.id AS id,treenode.parent_id 作为 parentid,(treenode.location).x AS x,(treenode.location).y AS y,(treenode.location).z AS z,treenode.confidence 作为信心,treenode.user_id AS user_id,treenode.radius AS 半径,((treenode.location).z - 50) AS z_diff,treenode_class_instance.class_instance_id AS skeleton_id从树节点左外连接(treenode_class_instance 内连接class_instance ON treenode_class_instance.class_instance_id= class_instance.idAND class_instance.class_id = 7828307)ON (treenode_class_instance.treenode_id = treenode.idAND treenode_class_instance.relation_id = 7828321)哪里 treenode.project_id = 4AND (treenode.location).x >= 8000AND (treenode.location).x <= (8000 + 4736)AND (treenode.location).y >= 22244AND (treenode.location).y <= (22244 + 3248)AND (treenode.location).z >= 0AND (treenode.location).z <= 100ORDER BY parentid DESC, id, z_diff限制 400;

该查询需要将近一分钟,而且,如果我将 EXPLAIN 添加到该查询的前面,似乎正在使用以下查询计划:

 限制(成本=56185.16..56185.17 行=1 宽度=89)->排序(成本=56185.16..56185.17 行=1 宽度=89)排序键:treenode.parent_id, treenode.id, (((treenode.location).z - 50::double precision))->嵌套循环左连接(成本=6715.16..56185.15 行=1 宽度=89)加入过滤器:(treenode_class_instance.treenode_id = treenode.id)->树节点上的位图堆扫描(成本=148.55..184.16 行=1 宽度=81)重新检查条件: (((location).x >= 8000::double precision) AND ((location).x <= 12736::double precision) AND ((location).z >= 0::double precision) AND ((location).z <= 100::双精度))过滤器:(((location).y >= 22244::double precision) AND ((location).y <= 25492::double precision) AND (project_id = 4))->BitmapAnd(成本=148.55..148.55行=9宽度=0)->位图索引扫描 location_x_index (cost=0.00..67.38 rows=2700 width=0)索引条件:(((location).x >= 8000::double precision) AND ((location).x <= 12736::double precision))->位图索引扫描 location_z_index (cost=0.00..80.91 rows=3253 width=0)索引条件:(((location).z >= 0::double precision) AND ((location).z <= 100::double precision))->哈希连接(成本=6566.61..53361.69 行=211144 宽度=16)哈希条件:(treenode_class_instance.class_instance_id = class_instance.id)->treenode_class_instance 上的 Seq 扫描(成本 = 0.00..25323.79 行 = 969285 宽度 = 16)过滤器:(relation_id = 7828321)->哈希(成本=5723.54..5723.54 行=51366 宽度=8)->class_instance 上的 Seq 扫描(成本 = 0.00..5723.54 行 = 51366 宽度 = 8)过滤器:(class_id = 7828307)(20 行)

但是,如果我将 x 范围条件中的 8000 替换为 10644,查询将在几分之一秒内执行,并且使用这个查询计划:

 限制(成本=58378.94..58378.95 行=2 宽度=89)->排序(成本=58378.94..58378.95 行=2 宽度=89)排序键:treenode.parent_id, treenode.id, (((treenode.location).z - 50::double precision))->散列左连接(成本=57263.11..58378.93 行=2 宽度=89)哈希条件:(treenode.id = treenode_class_instance.treenode_id)->树节点上的位图堆扫描(成本=231.12..313.44 行=2 宽度=81)重新检查条件: (((location).z >= 0::double precision) AND ((location).z <= 100::double precision) AND ((location).x >= 10644::double precision) AND ((location).x <= 15380::double precision))过滤器:(((location).y >= 22244::double precision) AND ((location).y <= 25492::double precision) AND (project_id = 4))->BitmapAnd(成本=231.12..231.12行=21宽度=0)->位图索引扫描 location_z_index (cost=0.00..80.91 rows=3253 width=0)索引条件:(((location).z >= 0::double precision) AND ((location).z <= 100::double precision))->位图索引扫描 location_x_index (cost=0.00..149.95 rows=6157 width=0)索引条件:(((location).x >= 10644::double precision) AND ((location).x <= 15380::double precision))->哈希(成本=53361.69..53361.69 行=211144 宽度=16)->哈希连接(成本=6566.61..53361.69 行=211144 宽度=16)哈希条件:(treenode_class_instance.class_instance_id = class_instance.id)->treenode_class_instance 上的 Seq 扫描(成本 = 0.00..25323.79 行 = 969285 宽度 = 16)过滤器:(relation_id = 7828321)->哈希(成本=5723.54..5723.54 行=51366 宽度=8)->class_instance 上的 Seq 扫描(成本 = 0.00..5723.54 行 = 51366 宽度 = 8)过滤器:(class_id = 7828307)(21 行)

我远不是解析这些查询计划的专家,但明显的区别似乎是,对于一个 x 范围,它使用 Hash Left JoinLEFT OUTER JOIN(非常快),而对于其他范围,它使用Nested Loop Left Join(似乎非常慢).在这两种情况下,查询都返回大约 90 行.如果我在查询的慢速版本之前执行 SET ENABLE_NESTLOOP TO FALSE,它会很快,但我知道 通常使用该设置是个坏主意.

例如,我是否可以创建一个特定的索引,以便查询规划器更有可能选择更有效的策略?谁能建议为什么 PostgreSQL 的查询规划器应该为这些查询之一选择如此糟糕的策略?下面我提供了可能有帮助的架构的详细信息.

<小时>

树节点表有 900,000 行,定义如下:

 表public.treenode"专栏 |类型 |修饰符---------------+------------------------------+------------------------------------------------------身份证 |bigint |非空默认 nextval('concept_id_seq'::regclass)用户 ID |bigint |不为空创作时间 |带时区的时间戳 |现在不为空默认值()edition_time |带时区的时间戳 |现在不为空默认值()project_id |bigint |不为空位置 |double3d |不为空parent_id |bigint |半径|双精度|非空默认 0信心 |整数 |非空默认值 5索引:treenode_pkey"主键,btree (id)"treenode_id_key" UNIQUE, btree (id)location_x_index" btree (((location).x))location_y_index" btree (((location).y))"location_z_index" btree (((location).z))外键约束:treenode_parent_id_fkey"外键(parent_id)参考treenode(id)参考:表treenode_class_instance"约束treenode_class_instance_treenode_id_fkey"外键(treenode_id)引用treenode(id)删除级联表treenode"约束treenode_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES treenode(id)触发器:on_edit_treenode BEFORE UPDATE ON treenode FOR EACH ROW EXECUTE PROCEDURE on_edit()继承:位置

double3d 复合类型定义如下:

复合类型public.double3d"专栏 |类型---+-------------------× |双精度是 |双精度| |双精度

另外两个参与join的表是treenode_class_instance:

 表public.treenode_class_instance"专栏 |类型 |修饰符-------------------+---------------------------+-----------------------------------------------------——身份证 |bigint |非空默认 nextval('concept_id_seq'::regclass)用户 ID |bigint |不为空创作时间 |带时区的时间戳 |现在不为空默认值()edition_time |带时区的时间戳 |现在不为空默认值()project_id |bigint |不为空关系_id |bigint |不为空treenode_id |bigint |不为空class_instance_id |bigint |不为空索引:treenode_class_instance_pkey"主键,btree (id)"treenode_class_instance_id_key" UNIQUE, btree (id)"idx_class_instance_id" btree (class_instance_id)外键约束:treenode_class_instance_class_instance_id_fkey"外键(class_instance_id)参考class_instance(id)删除级联treenode_class_instance_relation_id_fkey"外键(relation_id)参考关系(id)treenode_class_instance_treenode_id_fkey"外键(treenode_id)参考treenode(id)删除级联treenode_class_instance_user_id_fkey"外键(user_id)参考用户"(id)触发器:on_edit_treenode_class_instance 在更新之前更新每个行的 treenode_class_instance 执行程序 on_edit()继承:relation_instance

... 和 class_instance:

 表public.class_instance"专栏 |类型 |修饰符---------------+------------------------------+------------------------------------------------------身份证 |bigint |非空默认 nextval('concept_id_seq'::regclass)用户 ID |bigint |不为空创作时间 |带时区的时间戳 |现在不为空默认值()edition_time |带时区的时间戳 |现在不为空默认值()project_id |bigint |不为空class_id |bigint |不为空姓名 |字符变化(255) |不为空索引:class_instance_pkey"主键,btree (id)"class_instance_id_key" UNIQUE, btree (id)外键约束:class_instance_class_id_fkey"外键(class_id)参考类(id)class_instance_user_id_fkey"外键(user_id)参考用户"(id)参考:表class_instance_class_instance"约束class_instance_class_instance_class_instance_a_fkey"外键(class_instance_a)引用class_instance(id)删除级联表class_instance_class_instance"约束class_instance_class_instance_class_instance_b_fkey"外键(class_instance_b)引用class_instance(id)删除级联表connector_class_instance"约束connector_class_instance_class_instance_id_fkey"外键(class_instance_id)REFERENCES class_instance(id)表treenode_class_instance"约束treenode_class_instance_class_instance_id_fkey" FOREIGN KEY (class_instance_id) REFERENCES class_instance(id) ON DELETE CASCADE触发器:on_edit_class_instance BEFORE UPDATE ON class_instance FOR EACH ROW EXECUTE PROCEDURE on_edit()继承:概念

解决方案

如果查询规划器做出错误的决定,那主要是以下两种情况之一:

1.统计数据不准确.

你是否足够运行ANALYZE?在它的组合形式 VACUUM ANALYZE 中也很受欢迎.如果 autovacuum 开启(这是现代的默认设置)Postgres),ANALYZE 自动运行.但请考虑:

(前两个答案仍然适用于 Postgres 12.)

如果您的表并且数据分布不规则,则提高default_statistics_target 可能会有所帮助.或者更确切地说,只需设置相关列的统计目标(WHEREJOIN 查询的子句,基本上):

ALTER TABLE ... ALTER COLUMN ... SET STATISTICS 400;-- 校准数

<块引用>

目标可以在0到10000范围内设置;

之后再次运行ANALYZE(在相关表上).

2.计划员估算的费用设置已关闭.

阅读Planner Cost Constants 一章 在手册中.

查看此 default_statistics_target 和 random_page_cost 章节>一般有用的 PostgreSQL Wiki 页面.

还有许多其他可能的原因,但这些是目前最常见的原因.

I have a strange problem with PostgreSQL performance for a query, using PostgreSQL 8.4.9. This query is selecting a set of points within a 3D volume, using a LEFT OUTER JOIN to add a related ID column where that related ID exists. Small changes in the x range can cause PostgreSQL to choose a different query plan, which takes the execution time from 0.01 seconds to 50 seconds. This is the query in question:

SELECT treenode.id AS id,
       treenode.parent_id AS parentid,
       (treenode.location).x AS x,
       (treenode.location).y AS y,
       (treenode.location).z AS z,
       treenode.confidence AS confidence,
       treenode.user_id AS user_id,
       treenode.radius AS radius,
       ((treenode.location).z - 50) AS z_diff,
       treenode_class_instance.class_instance_id AS skeleton_id
  FROM treenode LEFT OUTER JOIN
         (treenode_class_instance INNER JOIN
          class_instance ON treenode_class_instance.class_instance_id
                                                  = class_instance.id
                            AND class_instance.class_id = 7828307)
       ON (treenode_class_instance.treenode_id = treenode.id
           AND treenode_class_instance.relation_id = 7828321)
  WHERE treenode.project_id = 4
    AND (treenode.location).x >= 8000
    AND (treenode.location).x <= (8000 + 4736)
    AND (treenode.location).y >= 22244
    AND (treenode.location).y <= (22244 + 3248)
    AND (treenode.location).z >= 0
    AND (treenode.location).z <= 100
  ORDER BY parentid DESC, id, z_diff
  LIMIT 400;

That query takes nearly a minute, and, if I add EXPLAIN to the front of that query, seems to be using the following query plan:

 Limit  (cost=56185.16..56185.17 rows=1 width=89)
   ->  Sort  (cost=56185.16..56185.17 rows=1 width=89)
         Sort Key: treenode.parent_id, treenode.id, (((treenode.location).z - 50::double precision))
         ->  Nested Loop Left Join  (cost=6715.16..56185.15 rows=1 width=89)
               Join Filter: (treenode_class_instance.treenode_id = treenode.id)
               ->  Bitmap Heap Scan on treenode  (cost=148.55..184.16 rows=1 width=81)
                     Recheck Cond: (((location).x >= 8000::double precision) AND ((location).x <= 12736::double precision) AND ((location).z >= 0::double precision) AND ((location).z <= 100::double precision))
                     Filter: (((location).y >= 22244::double precision) AND ((location).y <= 25492::double precision) AND (project_id = 4))
                     ->  BitmapAnd  (cost=148.55..148.55 rows=9 width=0)
                           ->  Bitmap Index Scan on location_x_index  (cost=0.00..67.38 rows=2700 width=0)
                                 Index Cond: (((location).x >= 8000::double precision) AND ((location).x <= 12736::double precision))
                           ->  Bitmap Index Scan on location_z_index  (cost=0.00..80.91 rows=3253 width=0)
                                 Index Cond: (((location).z >= 0::double precision) AND ((location).z <= 100::double precision))
               ->  Hash Join  (cost=6566.61..53361.69 rows=211144 width=16)
                     Hash Cond: (treenode_class_instance.class_instance_id = class_instance.id)
                     ->  Seq Scan on treenode_class_instance  (cost=0.00..25323.79 rows=969285 width=16)
                           Filter: (relation_id = 7828321)
                     ->  Hash  (cost=5723.54..5723.54 rows=51366 width=8)
                           ->  Seq Scan on class_instance  (cost=0.00..5723.54 rows=51366 width=8)
                                 Filter: (class_id = 7828307)
(20 rows)

However, if I replace the 8000 in the x range condition with 10644, the query is performed in a fraction of a second and uses this query plan:

 Limit  (cost=58378.94..58378.95 rows=2 width=89)
   ->  Sort  (cost=58378.94..58378.95 rows=2 width=89)
         Sort Key: treenode.parent_id, treenode.id, (((treenode.location).z - 50::double precision))
         ->  Hash Left Join  (cost=57263.11..58378.93 rows=2 width=89)
               Hash Cond: (treenode.id = treenode_class_instance.treenode_id)
               ->  Bitmap Heap Scan on treenode  (cost=231.12..313.44 rows=2 width=81)
                     Recheck Cond: (((location).z >= 0::double precision) AND ((location).z <= 100::double precision) AND ((location).x >= 10644::double precision) AND ((location).x <= 15380::double precision))
                     Filter: (((location).y >= 22244::double precision) AND ((location).y <= 25492::double precision) AND (project_id = 4))
                     ->  BitmapAnd  (cost=231.12..231.12 rows=21 width=0)
                           ->  Bitmap Index Scan on location_z_index  (cost=0.00..80.91 rows=3253 width=0)
                                 Index Cond: (((location).z >= 0::double precision) AND ((location).z <= 100::double precision))
                           ->  Bitmap Index Scan on location_x_index  (cost=0.00..149.95 rows=6157 width=0)
                                 Index Cond: (((location).x >= 10644::double precision) AND ((location).x <= 15380::double precision))
               ->  Hash  (cost=53361.69..53361.69 rows=211144 width=16)
                     ->  Hash Join  (cost=6566.61..53361.69 rows=211144 width=16)
                           Hash Cond: (treenode_class_instance.class_instance_id = class_instance.id)
                           ->  Seq Scan on treenode_class_instance  (cost=0.00..25323.79 rows=969285 width=16)
                                 Filter: (relation_id = 7828321)
                           ->  Hash  (cost=5723.54..5723.54 rows=51366 width=8)
                                 ->  Seq Scan on class_instance  (cost=0.00..5723.54 rows=51366 width=8)
                                       Filter: (class_id = 7828307)
(21 rows)

I'm far from an expert in parsing these query plans, but the clear difference seems to be that with one x range it uses a Hash Left Join for the LEFT OUTER JOIN (which is very fast), while with the other range it uses a Nested Loop Left Join (which seems to be very slow). In both cases the queries return about 90 rows. If I do SET ENABLE_NESTLOOP TO FALSE before the slow version of the query, it goes very fast, but I understand that using that setting in general is a bad idea.

Can I, for example, create a particular index in order to make it more likely that the query planner will choose the clearly more efficient strategy? Could anyone suggest why PostgreSQL's query planner should be choosing such a poor strategy for one of these queries? Below I have included details of the schema that may be helpful.


The treenode table has 900,000 rows, and is defined as follows:

                                     Table "public.treenode"
    Column     |           Type           |                      Modifiers                       
---------------+--------------------------+------------------------------------------------------
 id            | bigint                   | not null default nextval('concept_id_seq'::regclass)
 user_id       | bigint                   | not null
 creation_time | timestamp with time zone | not null default now()
 edition_time  | timestamp with time zone | not null default now()
 project_id    | bigint                   | not null
 location      | double3d                 | not null
 parent_id     | bigint                   | 
 radius        | double precision         | not null default 0
 confidence    | integer                  | not null default 5
Indexes:
    "treenode_pkey" PRIMARY KEY, btree (id)
    "treenode_id_key" UNIQUE, btree (id)
    "location_x_index" btree (((location).x))
    "location_y_index" btree (((location).y))
    "location_z_index" btree (((location).z))
Foreign-key constraints:
    "treenode_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES treenode(id)
Referenced by:
    TABLE "treenode_class_instance" CONSTRAINT "treenode_class_instance_treenode_id_fkey" FOREIGN KEY (treenode_id) REFERENCES treenode(id) ON DELETE CASCADE
    TABLE "treenode" CONSTRAINT "treenode_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES treenode(id)
Triggers:
    on_edit_treenode BEFORE UPDATE ON treenode FOR EACH ROW EXECUTE PROCEDURE on_edit()
Inherits: location

The double3d composite type is defined as follows:

Composite type "public.double3d"
 Column |       Type       
--------+------------------
 x      | double precision
 y      | double precision
 z      | double precision

The other two tables involved in the join are treenode_class_instance:

                               Table "public.treenode_class_instance"
      Column       |           Type           |                      Modifiers                       
-------------------+--------------------------+------------------------------------------------------
 id                | bigint                   | not null default nextval('concept_id_seq'::regclass)
 user_id           | bigint                   | not null
 creation_time     | timestamp with time zone | not null default now()
 edition_time      | timestamp with time zone | not null default now()
 project_id        | bigint                   | not null
 relation_id       | bigint                   | not null
 treenode_id       | bigint                   | not null
 class_instance_id | bigint                   | not null
Indexes:
    "treenode_class_instance_pkey" PRIMARY KEY, btree (id)
    "treenode_class_instance_id_key" UNIQUE, btree (id)
    "idx_class_instance_id" btree (class_instance_id)
Foreign-key constraints:
    "treenode_class_instance_class_instance_id_fkey" FOREIGN KEY (class_instance_id) REFERENCES class_instance(id) ON DELETE CASCADE
    "treenode_class_instance_relation_id_fkey" FOREIGN KEY (relation_id) REFERENCES relation(id)
    "treenode_class_instance_treenode_id_fkey" FOREIGN KEY (treenode_id) REFERENCES treenode(id) ON DELETE CASCADE
    "treenode_class_instance_user_id_fkey" FOREIGN KEY (user_id) REFERENCES "user"(id)
Triggers:
    on_edit_treenode_class_instance BEFORE UPDATE ON treenode_class_instance FOR EACH ROW EXECUTE PROCEDURE on_edit()
Inherits: relation_instance

... and class_instance:

                                  Table "public.class_instance"
    Column     |           Type           |                      Modifiers                       
---------------+--------------------------+------------------------------------------------------
 id            | bigint                   | not null default nextval('concept_id_seq'::regclass)
 user_id       | bigint                   | not null
 creation_time | timestamp with time zone | not null default now()
 edition_time  | timestamp with time zone | not null default now()
 project_id    | bigint                   | not null
 class_id      | bigint                   | not null
 name          | character varying(255)   | not null
Indexes:
    "class_instance_pkey" PRIMARY KEY, btree (id)
    "class_instance_id_key" UNIQUE, btree (id)
Foreign-key constraints:
    "class_instance_class_id_fkey" FOREIGN KEY (class_id) REFERENCES class(id)
    "class_instance_user_id_fkey" FOREIGN KEY (user_id) REFERENCES "user"(id)
Referenced by:
    TABLE "class_instance_class_instance" CONSTRAINT "class_instance_class_instance_class_instance_a_fkey" FOREIGN KEY (class_instance_a) REFERENCES class_instance(id) ON DELETE CASCADE
    TABLE "class_instance_class_instance" CONSTRAINT "class_instance_class_instance_class_instance_b_fkey" FOREIGN KEY (class_instance_b) REFERENCES class_instance(id) ON DELETE CASCADE
    TABLE "connector_class_instance" CONSTRAINT "connector_class_instance_class_instance_id_fkey" FOREIGN KEY (class_instance_id) REFERENCES class_instance(id)
    TABLE "treenode_class_instance" CONSTRAINT "treenode_class_instance_class_instance_id_fkey" FOREIGN KEY (class_instance_id) REFERENCES class_instance(id) ON DELETE CASCADE
Triggers:
    on_edit_class_instance BEFORE UPDATE ON class_instance FOR EACH ROW EXECUTE PROCEDURE on_edit()
Inherits: concept

解决方案

If the query planner makes bad decisions it's mostly one of two things:

1. The statistics are inaccurate.

Do you run ANALYZE enough? Also popular in it's combined form VACUUM ANALYZE. If autovacuum is on (which is the default in modern-day Postgres), ANALYZE is run automatically. But consider:

(Top two answers still apply for Postgres 12.)

If your table is big and data distribution is irregular, raising the default_statistics_target may help. Or rather, just set the statistics target for relevant columns (those in WHERE or JOIN clauses of your queries, basically):

ALTER TABLE ... ALTER COLUMN ... SET STATISTICS 400;  -- calibrate number

The target can be set in the range 0 to 10000;

Run ANALYZE again after that (on relevant tables).

2. The cost settings for planner estimates are off.

Read the chapter Planner Cost Constants in the manual.

Look at the chapters default_statistics_target and random_page_cost on this generally helpful PostgreSQL Wiki page.

There are many other possible reasons, but these are the most common ones by far.

这篇关于防止 PostgreSQL 有时选择错误的查询计划的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆