再次慢速Postgres 9.3查询 [英] Slow Postgres 9.3 Queries, again

查看:118
本文介绍了再次慢速Postgres 9.3查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是在慢速Postgres 9.3查询上对该问题的补充。

This is a follow-up to the question at Slow Postgres 9.3 queries.

新索引肯定会有所帮助。但是,我们看到的是,有时查询在实践中比运行EXPLAIN ANALYZE时要慢得多。下面是一个在生产数据库上运行的示例:

The new indexes definitely help. But what we're seeing is sometimes queries are much slower in practice than when we run EXPLAIN ANALYZE. An example is the following, run on the production database:

explain analyze SELECT * FROM messages WHERE groupid=957 ORDER BY id DESC LIMIT 20 OFFSET 31980;
                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=127361.90..127441.55 rows=20 width=747) (actual time=152.036..152.143 rows=20 loops=1)
   ->  Index Scan Backward using idx_groupid_id on messages  (cost=0.43..158780.12 rows=39869 width=747) (actual time=0.080..150.484 rows=32000 loops=1)
         Index Cond: (groupid = 957)
 Total runtime: 152.186 ms
(4 rows)

在慢速查询记录功能已打开的情况下,我们看到此查询的实例耗时超过2秒。我们还具有 log_lock_waits = true ,并且大约在同一时间未报告任何慢速锁定。是什么可以解释执行时间的巨大差异?

With slow query logging turned on, we see instances of this query taking over 2 seconds. We also have log_lock_waits=true, and no slow locks are reported around the same time. What could explain the vast difference in execution times?

推荐答案

LIMIT x偏移量y 的执行速度通常不超过 LIMIT x + y 。较大的 OFFSET 总是 相对昂贵。链接问题中的建议索引 很有帮助,但您却无法获得索引-仅扫描,Postgres仍然必须检查堆中的可见性(主要关系)至少 x + y 行以确定正确的结果。

LIMIT x OFFSET y generally performs not much faster than LIMIT x + y. A large OFFSET is always comparatively expensive. The suggested index in the linked question helps, but while you cannot get index-only scans out of it, Postgres still has to check visibility in the heap (the main relation) for at least x + y rows to determine the correct result.

SELECT *
FROM   messages
WHERE  groupid = 957
ORDER  BY id DESC
LIMIT  20
OFFSET 31980;

CLUSTER 您的索引(groupid,id)将有助于增加堆中数据的位置并减少每个查询要读取的数据页数。绝对是胜利。但是,如果同样有可能查询所有 groupid ,那将不会消除内存太少而无法缓存的瓶颈。如果您具有并发访问权限,请考虑使用pg_repack而不是 CLUSTER

CLUSTER on your index (groupid,id) would help to increase locality of data in the heap and reduce the number of data pages to be read per query. Definitely a win. But if all groupid are equally likely to be queried, that's not going to remove the bottleneck of too little RAM for cache. If you have concurrent access, consider pg_repack instead of CLUSTER:

  • Optimize Postgres timestamp query range

您是否真的需要返回所有列? ( SELECT * )如果只需要返回一些小的列,则启用仅索引扫描的覆盖索引可能会有所帮助。 (不过 autovacuum 必须足够强大以应付对表的写操作。只读表将是理想的选择。)

Do you actually need all columns returned? (SELECT *) A covering index enabling index-only scans might help if you only need a few small columns returned. (autovacuum must be strong enough to cope with writes to the table, though. Read-only table would be ideal.)

此外,根据您所链接的问题,您的表在磁盘上为32 GB。 (通常在RAM中多一点)。 (groupid,id)上的索引至少增加了308 MB (没有膨胀):

Also, according to your linked question, your table is 32 GB on disk. (Typically a bit more in RAM). The index on (groupid,id) adds another 308 MB at least (without any bloat):

SELECT pg_size_pretty(7337880.0 * 44);  -- row count * tuple size




  • 了解Postgres行大小

    • Making sense of Postgres row sizes
    • 您有8 GB RAM,您希望其中约有4.5 GB用于缓存( effective_cache_size = 4608MB )。足够缓存索引以供重复使用,但不足以缓存整个表。

      You have 8 GB RAM, of which you expect around 4.5 GB to be used for cache (effective_cache_size = 4608MB). That's enough to cache the index for repeated use, but not nearly enough to also cache the whole table.

      如果您的查询碰巧在缓存中找到数据页,那就快了。其他,不是很多。即使使用SSD存储,差异也很大(使用HDD的情况更多)。

      If your query happens to find data pages in cache, it's fast. Else, not so much. Big difference, even with SSD storage (much more with HDD).

      与该查询没有直接关系,但是8MB的 work _ mem( work_mem = 7864kB )对于您的设置来说似乎很小。根据各种其他因素,我将其设置为至少64MB(除非您有许多使用sort / hash操作的并发查询)。就像@Craig所评论的那样,解释(气泡,分析)可能会告诉我们更多信息。

      Not directly related to this query, but 8 MB of work_mem (work_mem = 7864kB) seems way to small for your setup. Depending on various other factors I would set this to at least 64MB (unless you have many concurrent queries with sort / hash operations). Like @Craig commented, EXPLAIN (BUFFERS, ANALYZE) might tell us more.

      最佳查询计划还取决于在价值频率上。如果只有少数几行通过过滤器,则某些 groupid 的结果可能为空,并且查询相对较快。如果必须提取表的很大一部分,则将获得普通的顺序扫描。您需要有效的表统计信息(再次 autovacuum )。以及 groupid 的更大的统计目标:

      The best query plan also depends on value frequencies. If only few rows pass the filter, the result might be empty for certain groupid and the query is comparatively fast. If a large portion of the table has to be fetched, a plain sequential scan wins. You need valid table statistics (autovacuum again). And possibly a larger statistics target for groupid:

      • Keep PostgreSQL from sometimes choosing a bad query plan

      这篇关于再次慢速Postgres 9.3查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆