PostgreSQL查询花费太长时间 [英] PostgreSQL query taking too long
问题描述
我的数据库有几亿行。我正在运行以下查询:
I have database with few hundred millions of rows. I'm running the following query:
select * from "Payments" as p
inner join "PaymentOrders" as po
on po."Id" = p."PaymentOrderId"
inner join "Users" as u
On u."Id" = po."UserId"
INNER JOIN "Roles" as r
on u."RoleId" = r."Id"
Where r."Name" = 'Moses'
LIMIT 1000
where子句在数据库中找到匹配项时,我会在几毫秒内得到结果,但是如果我修改查询并指定一个不存在的结果 r。 Name
在where子句中,需要太多时间才能完成。我猜想PostgreSQL是在 Payments
表(包含最多行)上进行顺序扫描的,每一行都一一比较。
When the where clause finds a match in database, I get the result in several milliseconds, but if I modify the query and specify a non-existent r."Name"
in where clause, it takes too much time to complete. I guess that PostgreSQL is doing a sequential scan on the Payments
table (which contains the most rows), comparing each row one by one.
PostgreSQL不够聪明,无法首先检查 Roles
表是否包含具有 Name
摩西
?
Isn't postgresql smart enough to check first if Roles
table contains any row with Name
'Moses'
?
角色表仅包含15行,而付款则包含约3.5亿行。
Roles table contains only 15 row, while Payments contains ~350 million.
我正在运行PostgreSQL 9.2.1。
I'm running PostgreSQL 9.2.1.
BTW,对相同模式/数据的同一查询在MS SQL Server上需要0.024毫秒才能完成。
BTW, this same query on the same schema/data takes 0.024ms to complete on MS SQL Server.
我将更新问题,并在几个小时后发布EXPLAIN ANALYZE数据。
I'll update the question and post EXPLAIN ANALYSE data in a few hours.
这里解释分析结果: http ://explain.depesz.com/s/7e7
这是服务器配置:
version PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit
client_encoding UNICODE
effective_cache_size 4500MB
fsync on
lc_collate English_United States.1252
lc_ctype English_United States.1252
listen_addresses *
log_destination stderr
log_line_prefix %t
logging_collector on
max_connections 100
max_stack_depth 2MB
port 5432
search_path dbo, "$user", public
server_encoding UTF8
shared_buffers 1500MB
TimeZone Asia/Tbilisi
wal_buffers 16MB
work_mem 10MB
我在i5 cpu(4核,3.3 GHz),8 GB RAM和Crucial m4 SSD 128GB上运行postgresql
I'm running postgresql on a i5 cpu (4 core, 3.3 GHz), 8 GB of RAM and Crucial m4 SSD 128GB
更新
这看起来像是查询计划程序中的错误。在Erwin Brandstetter的推荐下,我将其报告给了 Postgresql错误邮件列表。
推荐答案
最终成功的尝试
我的另一个想法-如每条评论:
如果找不到角色,则删除 LIMIT
子句会怎样?我怀疑这会导致快速计划-将 LIMIT
列为罪魁祸首。
通过将查询下推到子查询并将 LIMIT
仅应用于外部查询(未优化),您可以解决问题:
You may be able to solve your problem by pushing down your query into a subquery and applying the LIMIT
only to the outer query (untested):
SELECT *
FROM (
SELECT *
FROM "Roles" AS r
JOIN "Users" AS u ON u."RoleId" = r."Id"
JOIN "PaymentOrders" AS po ON po."UserId" = u."Id"
JOIN "Payments" AS p ON p."PaymentOrderId" = po."Id"
WHERE r."Name" = 'Moses'
) x
LIMIT 1000;
根据评论:@Davita测试并排除了此替代方法。 @Kevin的答案随后阐明了解决方法失败的原因:使用 CTE 而不是子查询。< br>
或在使用大查询消除不良情况之前检查角色是否存在。
As per comment: @Davita tested and ruled out this workaround. @Kevin's answer later clarified why the workaround failed: use a CTE instead of the subquery.
Or check for existence of a role, before you employ the big query to eliminate the bad case.
这为PostgreSQL提出了有关优化用 LIMIT
进行查询。
This leaves questions for PostgreSQL concerning the optimization of queries with LIMIT
.
已经有许多有关具有 LIMIT $的查询计划的最新错误报告c $ c>
。在此处上,我引用Simon Riggs对其中一份报告的评论:
There have been a number of recent bug reports concerning query plans with LIMIT
. I quote Simon Riggs commenting on one of these reports here:
带有LIMIT的错误计划很常见。这对我们来说是不利的,因为
通常会添加LIMIT,这应该会使查询更快而不是更慢。
Very bad plans with LIMIT are frequent. This is bad for us because adding LIMIT usually/is supposed to make queries faster, not slower.
我们需要做一些事情。
没有成功的第一次尝试
我错过了@Craig已经提到的 join_collapse_limit
。因此,这用途有限:
First attempt with no success
I missed that @Craig already mentioned join_collapse_limit
in the comments. So that was of limited use:
重新排序 JOIN
子句是否有效果?
Does reordering the JOIN
clauses have any effect?
SELECT *
FROM "Roles" AS r
JOIN "Users" AS u ON u."RoleId" = r."Id"
JOIN "PaymentOrders" AS po ON po."UserId" = u."Id"
JOIN "Payments" AS p ON p."PaymentOrderId" = po."Id"
WHERE r."Name" = 'Moses'
LIMIT 1000
相关:您没有偶然碰到 join_collapse_limit
或 geqo_threshold
?
设置太低可能会阻止计划者对您的 JOIN
子句重新排序,这可能会解释您的问题。
Related: you did not by chance mess with the setting of join_collapse_limit
or geqo_threshold
?
Very low setting might prevent the planner from reordering your JOIN
clauses, which might explain your problem.
如果这样不能解决问题,我将尝试在角色(名称)
上创建索引。这仅用15行就没有任何意义,但是我会尝试消除怀疑,即无效的统计信息或成本参数(甚至是错误)使计划者相信对 Roles进行的顺序扫描比其昂贵。
If that does not solve the case, I would try to create an index on "Roles"(Name)
. Not that this makes any sense with only 15 rows, but I would try to eliminate the suspicion that invalid statistics or cost parameters (or even a bug) make the planner believe the sequential scan on "Roles" to be more expensive than it is.
这篇关于PostgreSQL查询花费太长时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!