为什么Postgres扫描一个巨大的表而不是使用我的索引? [英] Why is Postgres scanning a huge table instead of using my index?
问题描述
我注意到我的一个SQL查询比我预期的慢得多,事实证明查询计划程序正在制定一个对我来说非常糟糕的计划。我的查询如下所示:
I noticed one of my SQL queries is much slower than I expected it to be, and it turns out that the query planner is coming up with a plan that seems really bad to me. My query looks like this:
select A.style, count(B.x is null) as missing, count(*) as total
from A left join B using (id, type)
where A.country_code in ('US', 'DE', 'ES')
group by A.country_code, A.style
order by A.country_code, total
B有一个(type,id)索引,和A有(country_code,style)索引。 A远小于B:A中的250K行与B中的100M行。
B has a (type, id) index, and A has a (country_code, style) index. A is much smaller than B: 250K rows in A vs 100M in B.
所以,我希望查询计划看起来像:
So, I expected the query plan to look something like:
- 使用A上的索引仅选择具有相应
country_code
- Group根据
country_code
和样式
- 添加计数
<的行li>左边连接B,根据
(类型,id)找到匹配的行(如果有的话)
index - Use the index on A to select just those rows with appropriate
country_code
- Left join with B, to find the matching row (if any) based on its
(type, id)
index - Group things according to
country_code
andstyle
- Add up the counts
但查询计划程序决定执行此操作的最佳方法是对B进行顺序扫描,然后对A进行右连接。我不能知道为什么会这样;有没有人有想法?这是它生成的实际查询计划:
But the query planner decides the best way to do this is a sequential scan on B, and then a right join against A. I can't fathom why that is; does anyone have an idea? Here's the actual query plan it generated:
Sort (cost=14283513.27..14283513.70 rows=171 width=595)
Sort Key: a.country_code, (count(*))
-> HashAggregate (cost=14283505.22..14283506.93 rows=171 width=595)
-> Hash Right Join (cost=8973.71..14282810.03 rows=55615 width=595)
Hash Cond: ((b.type = a.type) AND (b.id = a.id))
-> Seq Scan on b (cost=0.00..9076222.44 rows=129937844 width=579)
-> Hash (cost=8139.49..8139.49 rows=55615 width=28)
-> Bitmap Heap Scan on a (cost=1798.67..8139.49 rows=55615 width=28)
Recheck Cond: ((country_code = ANY ('{US,DE,ES}'::bpchar[])))
-> Bitmap Index Scan on a_country_code_type_idx (cost=0.00..1784.76 rows=55615 width=0)
Index Cond: ((country_code = ANY ('{US,DE,ES}'::bpchar[])))
编辑:根据对另一个问题的评论的线索,我尝试了将ENABLE_SEQSCAN设置为OFF;
,查询运行速度提高十倍。显然,我不想永久禁用顺序扫描,但这有助于确认我没有根据的猜测顺序扫描不是最好的可用计划。
following a clue from the comments on another question, I tried it with SET ENABLE_SEQSCAN TO OFF;
, and the query runs ten times as fast. Obviously I don't want to permanently disable sequential scans, but this helps confirm my otherwise-baseless guess that the sequential scan is not the best plan available.
推荐答案
如果您的添加测试证明了索引扫描实际上更快,那么它通常是其中一个或两个:
If the query is actually faster with an index scan as your added test proves, then it's typically one or both of these:
- 您的统计信息已关闭或不够精确,无法涵盖不规则的数据分发。
- 您的费用设置已关闭, Postgres用于基于其成本估算。
- Your statistics are off or not precise enough to cover irregular data distribution.
- Your cost settings are off, which Postgres uses to base its cost estimation on.
这个密切相关的答案中的两个细节:
Details for both in this closely related answer:
- Keep PostgreSQL from sometimes choosing a bad query plan
这篇关于为什么Postgres扫描一个巨大的表而不是使用我的索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!