PostgreSQL查询花费太长时间 [英] PostgreSQL query taking too long

查看:521
本文介绍了PostgreSQL查询花费太长时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据库有几亿行。我正在运行以下查询:

I have database with few hundred millions of rows. I'm running the following query:

select * from "Payments" as p
inner join "PaymentOrders" as po
on po."Id" = p."PaymentOrderId"
inner join "Users" as u
On u."Id" = po."UserId"
INNER JOIN "Roles" as r
on u."RoleId" = r."Id"
Where r."Name" = 'Moses'
LIMIT 1000

where子句在数据库中找到匹配项时,我会在几毫秒内得到结果,但是如果我修改查询并指定一个不存在的结果 r。 Name 在where子句中,需要太多时间才能完成。我猜想PostgreSQL是在 Payments 表(包含最多行)上进行顺序扫描的,每一行都一一比较。

When the where clause finds a match in database, I get the result in several milliseconds, but if I modify the query and specify a non-existent r."Name" in where clause, it takes too much time to complete. I guess that PostgreSQL is doing a sequential scan on the Payments table (which contains the most rows), comparing each row one by one.

PostgreSQL不够聪明,无法首先检查 Roles 表是否包含具有 Name 摩西

Isn't postgresql smart enough to check first if Roles table contains any row with Name 'Moses'?

角色表仅包含15行,而付款则包含约3.5亿行。

Roles table contains only 15 row, while Payments contains ~350 million.

我正在运行PostgreSQL 9.2.1。

I'm running PostgreSQL 9.2.1.

BTW,对相同模式/数据的同一查询在MS SQL Server上需要0.024毫秒才能完成。

BTW, this same query on the same schema/data takes 0.024ms to complete on MS SQL Server.

我将更新问题,并在几个小时后发布EXPLAIN ANALYZE数据。

I'll update the question and post EXPLAIN ANALYSE data in a few hours.

这里解释分析结果: http ://explain.depesz.com/s/7e7

这是服务器配置:

version PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit
client_encoding UNICODE
effective_cache_size    4500MB
fsync   on
lc_collate  English_United States.1252
lc_ctype    English_United States.1252
listen_addresses    *
log_destination stderr
log_line_prefix %t 
logging_collector   on
max_connections 100
max_stack_depth 2MB
port    5432
search_path dbo, "$user", public
server_encoding UTF8
shared_buffers  1500MB
TimeZone    Asia/Tbilisi
wal_buffers 16MB
work_mem    10MB

我在i5 cpu(4核,3.3 GHz),8 GB RAM和Crucial m4 SSD 128GB上运行postgresql

I'm running postgresql on a i5 cpu (4 core, 3.3 GHz), 8 GB of RAM and Crucial m4 SSD 128GB

更新
这看起来像是查询计划程序中的错误。在Erwin Brandstetter的推荐下,我将其报告给了 Postgresql错误邮件列表

推荐答案

最终成功的尝试



我的另一个想法-如每条评论:

如果找不到角色,则删除 LIMIT 子句会怎样?我怀疑这会导致快速计划-将 LIMIT 列为罪魁祸首。

通过将查询下推到子查询并将 LIMIT 仅应用于外部查询(未优化),您可以解决问题:

You may be able to solve your problem by pushing down your query into a subquery and applying the LIMIT only to the outer query (untested):

SELECT *
FROM  (
   SELECT *
   FROM   "Roles"         AS r  
   JOIN   "Users"         AS u  ON u."RoleId" = r."Id"
   JOIN   "PaymentOrders" AS po ON po."UserId" = u."Id"
   JOIN   "Payments"      AS p  ON p."PaymentOrderId" = po."Id"
   WHERE  r."Name" = 'Moses'
  ) x
LIMIT  1000;


根据评论:@Davita测试并排除了此替代方法。 @Kevin的答案随后阐明了解决方法失败的原因:使用 CTE 而不是子查询。< br>
或在使用大查询消除不良情况之前检查角色是否存在。

As per comment: @Davita tested and ruled out this workaround. @Kevin's answer later clarified why the workaround failed: use a CTE instead of the subquery.
Or check for existence of a role, before you employ the big query to eliminate the bad case.

这为PostgreSQL提出了有关优化用 LIMIT 进行查询。

This leaves questions for PostgreSQL concerning the optimization of queries with LIMIT.

已经有许多有关具有 LIMIT 。在此处上,我引用Simon Riggs对其中一份报告的评论:

There have been a number of recent bug reports concerning query plans with LIMIT. I quote Simon Riggs commenting on one of these reports here:


带有LIMIT的错误计划很常见。这对我们来说是不利的,因为
通常会添加LIMIT,这应该会使查询更快而不是更慢。

Very bad plans with LIMIT are frequent. This is bad for us because adding LIMIT usually/is supposed to make queries faster, not slower.

我们需要做一些事情。



没有成功的第一次尝试



我错过了@Craig已经提到的 join_collapse_limit 。因此,这用途有限:

First attempt with no success

I missed that @Craig already mentioned join_collapse_limit in the comments. So that was of limited use:

重新排序 JOIN 子句是否有效果?

Does reordering the JOIN clauses have any effect?

SELECT *
FROM   "Roles"         AS r  
JOIN   "Users"         AS u  ON u."RoleId" = r."Id"
JOIN   "PaymentOrders" AS po ON po."UserId" = u."Id"
JOIN   "Payments"      AS p  ON p."PaymentOrderId" = po."Id"
WHERE  r."Name" = 'Moses'
LIMIT  1000

相关:您没有偶然碰到 join_collapse_limit geqo_threshold
设置太低可能会阻止计划者对您的 JOIN 子句重新排序,这可能会解释您的问题。

Related: you did not by chance mess with the setting of join_collapse_limit or geqo_threshold? Very low setting might prevent the planner from reordering your JOIN clauses, which might explain your problem.

如果这样不能解决问题,我将尝试在角色(名称)上创建索引。这仅用15行就没有任何意义,但是我会尝试消除怀疑,即无效的统计信息或成本参数(甚至是错误)使计划者相信对 Roles进行的顺序扫描比其昂贵。

If that does not solve the case, I would try to create an index on "Roles"(Name). Not that this makes any sense with only 15 rows, but I would try to eliminate the suspicion that invalid statistics or cost parameters (or even a bug) make the planner believe the sequential scan on "Roles" to be more expensive than it is.

这篇关于PostgreSQL查询花费太长时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆