PostgreSQL查询非常慢,限制为1 [英] PostgreSQL query very slow with limit 1

查看:206
本文介绍了PostgreSQL查询非常慢,限制为1的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

添加limit 1时,查询速度非常慢.

My queries get very slow when I add a limit 1.

我有一张表object_values,该表带有带有时间戳记的对象值:

I have a table object_values with timestamped values for objects:

 timestamp |  objectID |  value
--------------------------------
 2014-01-27|       234 | ksghdf

每个对象我想获取最新值:

Per object I want to get the latest value:

SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC LIMIT 1;

(超过10分钟后,我取消了查询)

(I cancelled the query after more than 10 minutes)

当没有给定objectID的值时,此查询非常慢(如果有结果,则查询很快). 如果我取消限制,它将几乎立即告诉我没有结果:

This query is very slow when there are no values for a given objectID (it is fast if there are results). If I remove the limit it tells me nearly instantaneous that there are no results:

SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC;  
...  
Time: 0.463 ms

一个解释告诉我,无限制查询使用索引,而使用limit 1的查询不使用索引:

An explain shows me that the query without limit uses the index, where as the query with limit 1 does not make use of the index:

慢查询:

explain SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC limit 1;  
QUERY PLAN`
----------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.00..2350.44 rows=1 width=126)
->  Index Scan Backward using object_values_timestamp on object_values  (cost=0.00..3995743.59 rows=1700 width=126)
     Filter: (objectID = 53708)`

快速查询:

explain SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Sort  (cost=6540.86..6545.11 rows=1700 width=126)
   Sort Key: timestamp
   ->  Index Scan using object_values_objectID on working_hours_t  (cost=0.00..6449.65 rows=1700 width=126)
         Index Cond: (objectID = 53708)

该表包含44,884,559行和66,762个不同的objectID.
我在两个字段上都有单独的索引:timestampobjectID.
我已经在桌子上做了vacuum analyze,并且已经重新索引了桌子.

The table contains 44,884,559 rows and 66,762 distinct objectIDs.
I have separate indexes on both fields: timestamp and objectID.
I have done a vacuum analyze on the table and I have reindexed the table.

另外,当我将限制设置为3或更高时,慢查询将变得很快:

Additionally the slow query becomes fast when I set the limit to 3 or higher:

explain SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC limit 3;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Limit  (cost=6471.62..6471.63 rows=3 width=126)
   ->  Sort  (cost=6471.62..6475.87 rows=1700 width=126)
         Sort Key: timestamp
         ->  Index Scan using object_values_objectID on object_values  (cost=0.00..6449.65 rows=1700 width=126)
               Index Cond: (objectID = 53708)

通常,我认为这与计划者对执行成本的错误假设有关,因此选择了较慢的执行计划.

In general I assume it has to do with the planner making wrong assumptions about the exectution costs and therefore chooses for a slower execution plan.

这是真正的原因吗?有解决方案吗?

Is this the real reason? Is there a solution for this?

推荐答案

我遇到了一个问题,我认为这与缺乏行相关性统计信息有关.如果使用的是最新版本的Postgres,请考虑将其报告给pg-bugs.

You're running into an issue which relates, I think, to the lack of statistics on row correlations. Consider reporting it to pg-bugs for reference if this is using the latest version Postgres.

我对您的计划建议的解释是:

The interpretation I'd suggest for your plans is:

  • limit 1使Postgres查找单个行,并以此为前提,假设您的object_id足够通用,可以在索引扫描中迅速显示出来.

  • limit 1 makes Postgres look for a single row, and in doing so it assumes that your object_id is common enough that it'll show up reasonably quickly in an index scan.

根据统计数据,您可能认为它平均需要阅读约70行才能找到适合的行;只是没有意识到object_id和timestamp与它实际上将要读取表的很大一部分相关.

Based on the stats you gave its thinking probably is that it'll need to read ~70 rows on average to find one row that fits; it just doesn't realize that object_id and timestamp correlate to the point where it's actually going to read a large portion of the table.

limit 3意识到它并不常见,因此它认真考虑(并最终……)top-n用所需的object_id对预期的1700行进行排序,理由是这样做可能会更便宜.

limit 3, in contrast, makes it realize that it's uncommon enough, so it seriously considers (and ends up…) top-n sorting an expected 1700 rows with the object_id you want, on grounds that doing so is likely cheaper.

例如,它可能知道这些行的分布是这样的,以便它们全部打包在磁盘上的同一区域.

For instance, it might know that the distribution of these rows is so that they're all packed in the same area on the disk.

limit子句意味着它无论如何都将获取1700,因此它直接用于object_id上的索引.

no limit clause means it'll fetch the 1700 anyways, so it goes straight for the index on object_id.

解决方案,顺便说一句:在(object_id, timestamp)(object_id, timestamp desc)上添加索引.

Solution, btw: add an index on (object_id, timestamp) or (object_id, timestamp desc).

这篇关于PostgreSQL查询非常慢,限制为1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆