分区表查询仍在扫描所有分区 [英] Partitioned table query still scanning all partitions

查看:118
本文介绍了分区表查询仍在扫描所有分区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张桌子,上面有十亿条记录。为了提高性能,我将其划分为30个分区。最常见的查询在where子句中有(id = ...),因此我决定在 id 列。

I have a table with over a billion records. In order to improve performance, I partitioned it to 30 partitions. The most frequent queries have (id = ...) in their where clause, so I decided to partition the table on the id column.

基本上,分区是通过以下方式创建的:

Basically, the partitions were created in this way:

CREATE TABLE foo_0 (CHECK (id % 30 = 0)) INHERITS (foo);
CREATE TABLE foo_1 (CHECK (id % 30 = 1)) INHERITS (foo);
CREATE TABLE foo_2 (CHECK (id % 30 = 2)) INHERITS (foo);
CREATE TABLE foo_3 (CHECK (id % 30 = 3)) INHERITS (foo);
.
.
.

我为整个数据库运行了 ANALYZE 特别是,我通过运行以下命令使其收集了该表的 id 列的额外统计信息:

I ran ANALYZE for the entire database and in particular, I made it collect extra statistics for this table's id column by running:

ALTER TABLE foo ALTER COLUMN id SET STATISTICS 10000;

但是当我运行对 id 列表明计划程序仍在扫描所有分区。 constraint_exclusion 设置为 partition ,所以这不是问题。

However when I run queries that filter on the id column the planner shows that it's still scanning all the partitions. constraint_exclusion is set to partition, so that's not the problem.

EXPLAIN ANALYZE SELECT * FROM foo WHERE (id = 2);


                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.00..8106617.40 rows=3620981 width=54) (actual time=30.544..215.540 rows=171477 loops=1)
   ->  Append  (cost=0.00..8106617.40 rows=3620981 width=54) (actual time=30.539..106.446 rows=171477 loops=1)
         ->  Seq Scan on foo  (cost=0.00..0.00 rows=1 width=203) (actual time=0.002..0.002 rows=0 loops=1)
               Filter: (id = 2)
         ->  Bitmap Heap Scan on foo_0 foo  (cost=3293.44..281055.75 rows=122479 width=52) (actual time=0.020..0.020 rows=0 loops=1)
               Recheck Cond: (id = 2)
               ->  Bitmap Index Scan on foo_0_idx_1  (cost=0.00..3262.82 rows=122479 width=0) (actual time=0.018..0.018 rows=0 loops=1)
                     Index Cond: (id = 2)
         ->  Bitmap Heap Scan on foo_1 foo  (cost=3312.59..274769.09 rows=122968 width=56) (actual time=0.012..0.012 rows=0 loops=1)
               Recheck Cond: (id = 2)
               ->  Bitmap Index Scan on foo_1_idx_1  (cost=0.00..3281.85 rows=122968 width=0) (actual time=0.010..0.010 rows=0 loops=1)
                     Index Cond: (id = 2)
         ->  Bitmap Heap Scan on foo_2 foo  (cost=3280.30..272541.10 rows=121903 width=56) (actual time=30.504..77.033 rows=171477 loops=1)
               Recheck Cond: (id = 2)
               ->  Bitmap Index Scan on foo_2_idx_1  (cost=0.00..3249.82 rows=121903 width=0) (actual time=29.825..29.825 rows=171477 loops=1)
                     Index Cond: (id = 2)
.
.
.

我怎样做才能使刨床有更好的计划?我是否还需要对所有分区运行 ALTER TABLE foo ALTER COLUMN ID SET STATISTICS 10000;

What could I do to make the planer have a better plan? Do I need to run ALTER TABLE foo ALTER COLUMN id SET STATISTICS 10000; for all the partitions as well?

编辑

在使用了Erwin建议的查询更改后,计划程序仅扫描正确的分区,但是执行时间实际上比完整的分区差。

After using Erwin's suggested change to the query, the planner only scans the correct partition, however the execution time is actually worse then a full scan (at least of the index).

EXPLAIN ANALYZE select * from foo where (id % 30 = 2) and (id = 2);
                                                                         QUERY PLAN
                                                                             QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.00..8106617.40 rows=3620981 width=54) (actual time=32.611..224.934 rows=171477 loops=1)
   ->  Append  (cost=0.00..8106617.40 rows=3620981 width=54) (actual time=32.606..116.565 rows=171477 loops=1)
         ->  Seq Scan on foo  (cost=0.00..0.00 rows=1 width=203) (actual time=0.002..0.002 rows=0 loops=1)
               Filter: (id = 2)
         ->  Bitmap Heap Scan on foo_0 foo  (cost=3293.44..281055.75 rows=122479 width=52) (actual time=0.046..0.046 rows=0 loops=1)
               Recheck Cond: (id = 2)
               ->  Bitmap Index Scan on foo_0_idx_1  (cost=0.00..3262.82 rows=122479 width=0) (actual time=0.044..0.044 rows=0 loops=1)
                     Index Cond: (id = 2)
         ->  Bitmap Heap Scan on foo_1 foo  (cost=3312.59..274769.09 rows=122968 width=56) (actual time=0.021..0.021 rows=0 loops=1)
               Recheck Cond: (id = 2)
               ->  Bitmap Index Scan on foo_1_idx_1  (cost=0.00..3281.85 rows=122968 width=0) (actual time=0.020..0.020 rows=0 loops=1)
                     Index Cond: (id = 2)
         ->  Bitmap Heap Scan on foo_2 foo  (cost=3280.30..272541.10 rows=121903 width=56) (actual time=32.536..86.730 rows=171477 loops=1)
               Recheck Cond: (id = 2)
               ->  Bitmap Index Scan on foo_2_idx_1  (cost=0.00..3249.82 rows=121903 width=0) (actual time=31.842..31.842 rows=171477 loops=1)
                     Index Cond: (id = 2)
         ->  Bitmap Heap Scan on foo_3 foo  (cost=3475.87..285574.05 rows=129032 width=52) (actual time=0.035..0.035 rows=0 loops=1)
               Recheck Cond: (id = 2)
               ->  Bitmap Index Scan on foo_3_idx_1  (cost=0.00..3443.61 rows=129032 width=0) (actual time=0.031..0.031 rows=0 loops=1)
.
.
.
         ->  Bitmap Heap Scan on foo_29 foo  (cost=3401.84..276569.90 rows=126245 width=56) (actual time=0.019..0.019 rows=0 loops=1)
               Recheck Cond: (id = 2)
               ->  Bitmap Index Scan on foo_29_idx_1  (cost=0.00..3370.28 rows=126245 width=0) (actual time=0.018..0.018 rows=0 loops=1)
                     Index Cond: (id = 2)
 Total runtime: 238.790 ms

对:

EXPLAIN ANALYZE select * from foo where (id % 30 = 2) and (id = 2);
                                                                            QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.00..273120.30 rows=611 width=56) (actual time=31.519..257.051 rows=171477 loops=1)
   ->  Append  (cost=0.00..273120.30 rows=611 width=56) (actual time=31.516..153.356 rows=171477 loops=1)
         ->  Seq Scan on foo  (cost=0.00..0.00 rows=1 width=203) (actual time=0.002..0.002 rows=0 loops=1)
               Filter: ((id = 2) AND ((id % 30) = 2))
         ->  Bitmap Heap Scan on foo_2 foo  (cost=3249.97..273120.30 rows=610 width=56) (actual time=31.512..124.177 rows=171477 loops=1)
               Recheck Cond: (id = 2)
               Filter: ((id % 30) = 2)
               ->  Bitmap Index Scan on foo_2_idx_1  (cost=0.00..3249.82 rows=121903 width=0) (actual time=30.816..30.816 rows=171477 loops=1)
                     Index Cond: (id = 2)
 Total runtime: 270.384 ms


推荐答案

对于非平凡表达式,您必须在查询中重复或多或少的逐字条件,以使Postgres查询计划者理解它可以依靠 CHECK 约束。即使看起来似乎多余!

For non-trivial expressions you have to repeat the more or less verbatim condition in queries to make the Postgres query planner understand it can rely on the CHECK constraint. Even if it seems redundant!

每个文档


启用约束排除后,计划者将检查以下项的
约束每个分区,并尝试证明该分区需要
不被扫描,因为它不能包含满足
查询的 WHERE 子句的任何行。 计划者可以证明这一点时,它将在查询计划中排除
个分区。

With constraint exclusion enabled, the planner will examine the constraints of each partition and try to prove that the partition need not be scanned because it could not contain any rows meeting the query's WHERE clause. When the planner can prove this, it excludes the partition from the query plan.

大胆强调我的。计划者不理解复杂的表达式。
当然,这也必须满足:

Bold emphasis mine. The planner does not understand complex expressions. Of course, this has to be met, too:


确保 constraint_exclusion 配置参数未在<$ c中禁用
$ c> postgresql.conf
。如果是这样,查询将不会根据需要进行优化。

Ensure that the constraint_exclusion configuration parameter is not disabled in postgresql.conf. If it is, queries will not be optimized as desired.

而不是

SELECT * FROM foo WHERE (id = 2);

尝试:

SELECT * FROM foo WHERE id % 30 = 2 AND id = 2;

并且:


constraint_exclusion 实际上是
,实际上既不是上也不是 off ,而是一个中间设置
partition ,这导致该技术仅应用于可能在分区表上运行的查询
。设置
会使计划者检查所有查询中的 CHECK 约束,甚至
不太可能受益的简单约束。

The default (and recommended) setting of constraint_exclusion is actually neither on nor off, but an intermediate setting called partition, which causes the technique to be applied only to queries that are likely to be working on partitioned tables. The on setting causes the planner to examine CHECK constraints in all queries, even simple ones that are unlikely to benefit.

您可以尝试使用 constraint_exclusion = on 来查看计划者是否在没有多余逐字条件的情况下继续前进。但是,您必须权衡此设置的成本和收益。

You can experiment with the constraint_exclusion = on to see if the planner catches on without redundant verbatim condition. But you have to weigh cost and benefit of this setting.

替代方法是为分区设置更简单的条件,因为以@harmic 概述。

The alternative would be simpler conditions for your partitions as already outlined by @harmic.

不,增加统计信息在这种情况下将无济于事。在查询中只有 CHECK 约束和您的 WHERE 条件。

An no, increasing the number for STATISTICS will not help in this case. Only the CHECK constraints and your WHERE conditions in the query matter.

这篇关于分区表查询仍在扫描所有分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆