慢的Postgres 9.3查询 [英] Slow Postgres 9.3 queries

查看:104
本文介绍了慢的Postgres 9.3查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试确定是否可以加快对存储电子邮件的数据库的两个查询。表格如下:

I'm trying to figure out if I can speed up two queries on a database storing email messages. Here's the table:

\d messages;
                             Table "public.messages"
     Column     |  Type   |                       Modifiers
----------------+---------+-------------------------------------------------------
 id             | bigint  | not null default nextval('messages_id_seq'::regclass)
 created        | bigint  |
 updated        | bigint  |
 version        | bigint  |
 threadid       | bigint  |
 userid         | bigint  |
 groupid        | bigint  |
 messageid      | text    |
 date           | bigint  |
 num            | bigint  |
 hasattachments | boolean |
 placeholder    | boolean |
 compressedmsg  | bytea   |
 revcount       | bigint  |
 subject        | text    |
 isreply        | boolean |
 likes          | bytea   |
 isspecial      | boolean |
 pollid         | bigint  |
 username       | text    |
 fullname       | text    |
Indexes:
    "messages_pkey" PRIMARY KEY, btree (id)
    "idx_unique_message_messageid" UNIQUE, btree (groupid, messageid)
    "idx_unique_message_num" UNIQUE, btree (groupid, num)
    "idx_group_id" btree (groupid)
    "idx_message_id" btree (messageid)
    "idx_thread_id" btree (threadid)
    "idx_user_id" btree (userid)

的输出选择relname,relpages,reltuples :: numeric,pg_size_pretty(pg_table_size(oid))FROM pg_class WHERE oid ='messages':: regclass;

 relname  | relpages | reltuples | pg_size_pretty
----------+----------+-----------+----------------
 messages |  1584913 |   7337880 | 32 GB

一些可能相关的postgres配置值:

Some possibly relevant postgres config values:

shared_buffers = 1536MB
effective_cache_size = 4608MB
work_mem = 7864kB
maintenance_work_mem = 384MB

以下是解释分析输出:

explain analyze SELECT * FROM messages WHERE groupid=1886 ORDER BY id ASC LIMIT 20 offset 4440;
                                                                      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=479243.63..481402.39 rows=20 width=747) (actual time=14167.374..14167.408 rows=20 loops=1)
   ->  Index Scan using messages_pkey on messages  (cost=0.43..19589605.98 rows=181490 width=747) (actual time=14105.172..14167.188 rows=4460 loops=1)
         Filter: (groupid = 1886)
         Rows Removed by Filter: 2364949
 Total runtime: 14167.455 ms
(5 rows)

第二个查询:

explain analyze SELECT * FROM messages WHERE groupid=1886 ORDER BY created ASC LIMIT 20 offset 4440;
                                                                        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=538650.72..538650.77 rows=20 width=747) (actual time=671.983..671.992 rows=20 loops=1)
   ->  Sort  (cost=538639.62..539093.34 rows=181490 width=747) (actual time=670.680..671.829 rows=4460 loops=1)
         Sort Key: created
         Sort Method: top-N heapsort  Memory: 7078kB
         ->  Bitmap Heap Scan on messages  (cost=7299.11..526731.31 rows=181490 width=747) (actual time=84.975..512.969 rows=200561 loops=1)
               Recheck Cond: (groupid = 1886)
               ->  Bitmap Index Scan on idx_unique_message_num  (cost=0.00..7253.73 rows=181490 width=0) (actual time=57.239..57.239 rows=203423 loops=1)
                     Index Cond: (groupid = 1886)
 Total runtime: 672.787 ms
(9 rows)

这是在8GB固态硬盘上例如,平均负载通常为0.15。

This is on an SSD, 8GB Ram instance, load average is usually around 0.15.

我绝对不是专家。这是否只是数据散布在整个磁盘上的情况?我唯一使用CLUSTER的解决方案吗?

I'm definitely no expert. Is this a case of the data just being spread throughout the disk? Is my only solution to use CLUSTER?

我不明白的一件事是为什么使用 idx_unique_message_num 作为第二个查询的索引。为什么按ID排序这么慢?

One thing I don't understand is why is it using idx_unique_message_num as the index for the second query. And why is ordering by ID so much slower?

推荐答案

如果有很多记录的 groupid = 1886 (注释:有200,563),要在行的已排序子集的偏移处获取记录,将需要进行排序(或等效的堆算法),这很慢。

If there are many records with groupid=1886 (from comment: there are 200,563), to get to records at an OFFSET of a sorted subset of rows, would require sorting (or an equivalent heap algorithm) which is slow.

这可以通过添加索引来解决。在这种情况下,一个在(groupid,id)上,另一个在(groupid,created)上。

This could be solved by adding an index. In this case, one on (groupid,id) and another on (groupid,created).

来自评论:这确实有所帮助,将运行时间降低到5ms-10ms。

From comment: This indeed helped, taking down the runtime to 5ms-10ms.

这篇关于慢的Postgres 9.3查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆