PostgreSQL 查询速度非常慢,限制为 1 [英] PostgreSQL query very slow with limit 1

查看:25
本文介绍了PostgreSQL 查询速度非常慢,限制为 1的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我添加 limit 1 时,我的查询变得很慢.

我有一个表 object_values 带有对象的时间戳值:

 时间戳 |对象ID |价值--------------------------------2014-01-27|第234话知乎

我想获取每个对象的最新值:

SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC LIMIT 1;

(10多分钟后我取消了查询)

当给定的 objectID 没有值时,此查询非常慢(如果有结果则很快).如果我删除限制,它几乎立即告诉我没有结果:

SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC;...时间:0.463 毫秒

一个解释告诉我没有限制的查询使用索引,而limit 1的查询没有使用索引:

慢查询:

explain SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC limit 1;查询计划`----------------------------------------------------------------------------------------------------------------------------限制(成本=0.00..2350.44 行=1 宽度=126)->在 object_values 上使用 object_values_timestamp 向后索引扫描(成本 = 0.00..3995743.59 行 = 1700 宽度 = 126)过滤器:(objectID = 53708)`

快速查询:

explain SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC;查询计划--------------------------------------------------------------------------------------------------------------排序(成本=6540.86..6545.11 行=1700 宽度=126)排序键:时间戳->在working_hours_t 上使用object_values_objectID 进行索引扫描(成本=0.00..6449.65 行=1700 宽度=126)索引条件:(objectID = 53708)

该表包含 44,884,559 行和 66,762 个不同的对象 ID.
我在两个字段上都有单独的索引:timestampobjectID.
我对表格进行了真空分析,并重新索引了表格.

此外,当我将限制设置为 3 或更高时,慢查询会变快:

explain SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC limit 3;查询计划--------------------------------------------------------------------------------------------------------------------限制(成本=6471.62..6471.63 行=3 宽度=126)->排序(成本=6471.62..6475.87 行=1700 宽度=126)排序键:时间戳->在 object_values 上使用 object_values_objectID 进行索引扫描(成本 = 0.00..6449.65 行 = 1700 宽度 = 126)索引条件:(objectID = 53708)

总的来说,我认为这与规划人员对执行成本做出错误假设有关,因此选择了较慢的执行计划.

这是真正的原因吗?有解决办法吗?

解决方案

我认为,您遇到的问题与缺乏行相关性统计数据有关.如果这是使用最新版本的 Postgres,请考虑将其报告给 pg-bugs 以供参考.

我对您的计划建议的解释是:

  • limit 1 使 Postgres 查找单行,这样做时它假设您的 object_id 足够常见,它会在索引扫描中相当快地显示.>

    根据统计数据,您可能认为它平均需要读取约 70 行才能找到合适的行;它只是没有意识到 object_id 和时间戳与它实际要读取表的大部分内容的点相关.

  • 相比之下,
  • limit 3 让它意识到它很不常见,所以它认真考虑(并最终......)用 对预期的 1700 行进行排序object_id 你想要,因为这样做可能更便宜.

    例如,它可能知道这些行的分布是为了它们都打包在磁盘的同一区域中.

  • 没有 limit 子句意味着它无论如何都会获取 1700,所以它会直接获取 object_id 上的索引.

解决方案,顺便说一句:在 (object_id, timestamp)(object_id, timestamp desc) 上添加索引.

My queries get very slow when I add a limit 1.

I have a table object_values with timestamped values for objects:

 timestamp |  objectID |  value
--------------------------------
 2014-01-27|       234 | ksghdf

Per object I want to get the latest value:

SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC LIMIT 1;

(I cancelled the query after more than 10 minutes)

This query is very slow when there are no values for a given objectID (it is fast if there are results). If I remove the limit it tells me nearly instantaneous that there are no results:

SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC;  
...  
Time: 0.463 ms

An explain shows me that the query without limit uses the index, where as the query with limit 1 does not make use of the index:

Slow query:

explain SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC limit 1;  
QUERY PLAN`
----------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.00..2350.44 rows=1 width=126)
->  Index Scan Backward using object_values_timestamp on object_values  (cost=0.00..3995743.59 rows=1700 width=126)
     Filter: (objectID = 53708)`

Fast query:

explain SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Sort  (cost=6540.86..6545.11 rows=1700 width=126)
   Sort Key: timestamp
   ->  Index Scan using object_values_objectID on working_hours_t  (cost=0.00..6449.65 rows=1700 width=126)
         Index Cond: (objectID = 53708)

The table contains 44,884,559 rows and 66,762 distinct objectIDs.
I have separate indexes on both fields: timestamp and objectID.
I have done a vacuum analyze on the table and I have reindexed the table.

Additionally the slow query becomes fast when I set the limit to 3 or higher:

explain SELECT * FROM object_values WHERE (objectID = 53708) ORDER BY timestamp DESC limit 3;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Limit  (cost=6471.62..6471.63 rows=3 width=126)
   ->  Sort  (cost=6471.62..6475.87 rows=1700 width=126)
         Sort Key: timestamp
         ->  Index Scan using object_values_objectID on object_values  (cost=0.00..6449.65 rows=1700 width=126)
               Index Cond: (objectID = 53708)

In general I assume it has to do with the planner making wrong assumptions about the exectution costs and therefore chooses for a slower execution plan.

Is this the real reason? Is there a solution for this?

解决方案

You're running into an issue which relates, I think, to the lack of statistics on row correlations. Consider reporting it to pg-bugs for reference if this is using the latest version Postgres.

The interpretation I'd suggest for your plans is:

  • limit 1 makes Postgres look for a single row, and in doing so it assumes that your object_id is common enough that it'll show up reasonably quickly in an index scan.

    Based on the stats you gave its thinking probably is that it'll need to read ~70 rows on average to find one row that fits; it just doesn't realize that object_id and timestamp correlate to the point where it's actually going to read a large portion of the table.

  • limit 3, in contrast, makes it realize that it's uncommon enough, so it seriously considers (and ends up…) top-n sorting an expected 1700 rows with the object_id you want, on grounds that doing so is likely cheaper.

    For instance, it might know that the distribution of these rows is so that they're all packed in the same area on the disk.

  • no limit clause means it'll fetch the 1700 anyways, so it goes straight for the index on object_id.

Solution, btw: add an index on (object_id, timestamp) or (object_id, timestamp desc).

这篇关于PostgreSQL 查询速度非常慢,限制为 1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆