生产中的PostgreSQL查询速度慢-帮助我理解此解释分析输出 [英] Slow PostgreSQL query in production - help me understand this explain analyze output

查看:110
本文介绍了生产中的PostgreSQL查询速度慢-帮助我理解此解释分析输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个查询,它需要9分钟才能在GCC gcc(GCC)4.1.2 20080704(Red Hat 4.1.2-46),64版的x86_64-unknown-linux-gnu的PostgreSQL 9.0.0上运行位

I have a query that is taking 9 minutes to run on PostgreSQL 9.0.0 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit

此查询由自动生成我的应用程序.它试图找到一所学校中的所有教师成员".成员资格是在组中具有角色的用户.团体有几种类型,但是在这里重要的是学校和服务.如果某人既是某服务的教师成员又是这所学校的成员(15499),那么他们就是我们所需要的.

This query is automatically generated by hibernate for my application. It's trying to find all of the "teacher members" in a school. A membership is a user with a role in a group. There are several types of groups, but here what matters are schools and services. If someone is a teacher member in a service and a member in this school (15499) then they are what we are looking for.

该查询过去在生产环境中运行良好,而在开发环境中仍然运行良好,但是在生产环境中,现在要花费几分钟才能运行.你能帮我理解为什么吗?

This query used to run fine in production and still runs fine in development, but in production it is now taking several minutes to run. Can you help me understand why?

以下是查询:

select distinct user1_.ID as ID14_, user1_.FIRST_NAME as FIRST2_14_, user1_.LAST_NAME as LAST3_14_, user1_.STREET_1 as STREET4_14_, user1_.STREET_2 as STREET5_14_, user1_.CITY as CITY14_, user1_.us_state_id as us7_14_, user1_.REGION as REGION14_, user1_.country_id as country9_14_, user1_.postal_code as postal10_14_, user1_.USER_NAME as USER11_14_, user1_.PASSWORD as PASSWORD14_, user1_.PROFESSION as PROFESSION14_, user1_.PHONE as PHONE14_, user1_.URL as URL14_, user1_.bio as bio14_, user1_.LAST_LOGIN as LAST17_14_, user1_.STATUS as STATUS14_, user1_.birthdate as birthdate14_, user1_.ageInYears as ageInYears14_, user1_.deleted as deleted14_, user1_.CREATEDATE as CREATEDATE14_, user1_.audit as audit14_, user1_.migrated2008 as migrated24_14_, user1_.creator as creator14_ 
from DIR_MEMBERSHIPS membership0_ 
inner join DIR_USERS user1_ on membership0_.USER_ID=user1_.ID, DIR_ROLES role2_, DIR_GROUPS group4_ 
where membership0_.role=role2_.ID 
and membership0_.GROUP_ID=group4_.id 
and membership0_.GROUP_ID=15499 
and case when membership0_.expires is null 
    then 1 
    else case when (membership0_.expires > CURRENT_TIMESTAMP and (membership0_.startDate is null or membership0_.startDate < CURRENT_TIMESTAMP)) 
        then 1 
        else 0 end 
    end =1 
and membership0_.deleted=false 
and role2_.deleted=false 
and role2_.NAME='ROLE_MEMBER' 
and group4_.deleted=false 
and user1_.STATUS='active' 
and user1_.deleted=false 
and (membership0_.USER_ID in (
    select membership7_.USER_ID 
    from DIR_MEMBERSHIPS membership7_, DIR_USERS user8_, DIR_ROLES role9_ 
    where membership7_.USER_ID=user8_.ID 
    and membership7_.role=role9_.ID 
    and case when membership7_.expires is null 
        then 1 
        else case when (membership7_.expires > CURRENT_TIMESTAMP 
                        and (membership7_.startDate is null or membership7_.startDate < CURRENT_TIMESTAMP)) 
            then 1 
            else 0 end 
        end =1 
    and membership7_.deleted=false 
    and role9_.NAME='ROLE_TEACHER_MEMBER'));

解释分析输出:

 HashAggregate  (cost=61755.63..61755.64 rows=1 width=3334) (actual time=652504.302..652504.307 rows=4 loops=1)
   ->  Nested Loop  (cost=4355.35..61755.56 rows=1 width=3334) (actual time=304.450..652504.217 rows=6 loops=1)
     ->  Nested Loop  (cost=4355.35..61747.28 rows=1 width=3342) (actual time=304.419..652504.060 rows=6 loops=1)
           ->  Nested Loop Semi Join  (cost=4355.35..61738.97 rows=1 width=32) (actual time=304.385..652503.961 rows=6 loops=1)
                 Join Filter: (user_id = user_id)
                 ->  Nested Loop  (cost=0.00..32.75 rows=1 width=16) (actual time=0.190..26.703 rows=758 loops=1)
                       ->  Seq Scan on dir_roles role2_  (cost=0.00..1.25 rows=1 width=8) (actual time=0.032..0.038 rows=1 loops=1)
                             Filter: ((NOT deleted) AND ((name)::text = 'ROLE_MEMBER'::text))
                       ->  Index Scan using dir_memberships_role_group_id_index on dir_memberships membership0_  (cost=0.00..31.49 rows=1 width=24) (actual time=0.151..25.626 rows=758 loops=1)
                             Index Cond: ((role = role2_.id) AND (group_id = 15499))
                             Filter: ((NOT deleted) AND (CASE WHEN (expires IS NULL) THEN 1 ELSE CASE WHEN ((expires > now()) AND ((startdate IS NULL) OR (startdate < now()))) THEN 1 ELSE 0 END END = 1))
                 ->  Nested Loop  (cost=4355.35..61692.86 rows=1069 width=16) (actual time=91.088..843.967 rows=79986 loops=758)
                       ->  Nested Loop  (cost=4355.35..54185.33 rows=1069 width=8) (actual time=91.065..555.830 rows=79986 loops=758)
                             ->  Seq Scan on dir_roles role9_  (cost=0.00..1.25 rows=1 width=8) (actual time=0.006..0.013 rows=1 loops=758)
                                   Filter: ((name)::text = 'ROLE_TEACHER_MEMBER'::text)
                             ->  Bitmap Heap Scan on dir_memberships membership7_  (cost=4355.35..53983.63 rows=16036 width=16) (actual time=91.047..534.236 rows=79986 loops=758)
                                   Recheck Cond: (role = role9_.id)
                                   Filter: ((NOT deleted) AND (CASE WHEN (expires IS NULL) THEN 1 ELSE CASE WHEN ((expires > now()) AND ((startdate IS NULL) OR (startdate < now()))) THEN 1 ELSE 0 END END = 1))
                                   ->  Bitmap Index Scan on dir_memberships_role_index  (cost=0.00..4355.09 rows=214190 width=0) (actual time=87.050..87.050 rows=375858 loops=758)
                                         Index Cond: (role = role9_.id)
                       ->  Index Scan using dir_users_pkey on dir_users user8_  (cost=0.00..7.01 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=60629638)
                             Index Cond: (id = user_id)
           ->  Index Scan using dir_users_pkey on dir_users user1_  (cost=0.00..8.29 rows=1 width=3334) (actual time=0.011..0.011 rows=1 loops=6)
                 Index Cond: (id = user_id)
                 Filter: ((NOT deleted) AND ((status)::text = 'active'::text))
     ->  Index Scan using dir_groups_pkey on dir_groups group4_  (cost=0.00..8.28 rows=1 width=8) (actual time=0.023..0.023 rows=1 loops=6)
           Index Cond: (group4_.id = 15499)
           Filter: (NOT group4_.deleted)
Total runtime: 652504.827 ms
(29 rows)

我正在阅读并阅读论坛帖子和用户手册,但是我想不出什么使它运行得更快,除非也许可以为使用now()函数的选择建立索引. /p>

I am reading and reading forum posts and the user manual, but I can't figure out what would make this run faster, except maybe if it were possible to make indexes for the select that uses the now() function.

推荐答案

我重新编写了您的查询,并认为这样做会更快:

I rewrote your query and assume this will be faster:

SELECT u.id AS id14_, u.first_name AS first2_14_, u.last_name AS last3_14_, u.street_1 AS street4_14_, u.street_2 AS street5_14_, u.city AS city14_, u.us_state_id AS us7_14_, u.region AS region14_, u.country_id AS country9_14_, u.postal_code AS postal10_14_, u.user_name AS user11_14_, u.password AS password14_, u.profession AS profession14_, u.phone AS phone14_, u.url AS url14_, u.bio AS bio14_, u.last_login AS last17_14_, u.status AS status14_, u.birthdate AS birthdate14_, u.ageinyears AS ageinyears14_, u.deleted AS deleted14_, u.createdate AS createdate14_, u.audit AS audit14_, u.migrated2008 AS migrated24_14_, u.creator AS creator14_
FROM   dir_users u 
WHERE  u.status = 'active'
AND    u.deleted = FALSE
AND    EXISTS (
   SELECT 1
   FROM   dir_memberships m
   JOIN   dir_roles       r ON r.id = m.role
   JOIN   dir_groups      g ON g.id = m.group_id
   WHERE  m.group_id = 15499
   AND    m.user_id = u.id
   AND   (m.expires IS NULL
       OR m.expires > now() AND (m.startdate IS NULL OR m.startdate < now()))
   AND    m.deleted = FALSE
   AND    r.deleted = FALSE
   AND    r.name = 'ROLE_MEMBER'
   AND    g.deleted = FALSE
   )
AND    EXISTS (
    SELECT 1
    FROM   dir_memberships m
    JOIN   dir_roles       r ON r.id = m.role
    WHERE (m.expires IS NULL
        OR m.expires > now() AND (m.startDate IS NULL OR m.startDate < now()))
    AND    m.deleted = FALSE
    AND    m.user_id = u.id
    AND    r.name = 'ROLE_TEACHER_MEMBER'
    )

EXISTS

重写
  • 用简单的表达式替换了奇怪的case ... end = 1表达式
  • 使用显式联接语法重写所有JOIN,以使其更易于阅读.
  • 将大的JOIN构造和IN表达式转换为两个EXISTS半联接,这使DISTINCT的必要性无效.这应该快很多.
  • 进行了许多次较小的编辑以使查询更简单,但它们并没有改变内容.
    尤其要使用simper别名-您的声音嘈杂而令人困惑.
  • Rewrite with EXISTS

    • Replaced the weird case ... end = 1 expressions with simple expressions
    • Rewrote all JOINs with explicit join syntax to make it easier to read.
    • Transformed the big JOIN construct and the IN expression into two EXISTS semi-joins, which voids the necessity for DISTINCT. This should be quite a bit faster.
    • Lots of minor edits to make the query simpler, but they don't change the substance.
      Especially use simper aliases - what you had was noisy and confusing.
    • 如果这还不够快,并且您的写入性能可以处理更多索引,请添加此部分多列索引:

      If this isn't fast enough yet, and your write performance can deal with more indexes, add this partial multi-column index:

      CREATE INDEX dir_memberships_g_id_u_id_idx ON dir_memberships (group_id, user_id)
      WHERE  deleted = FALSE;
      

      WHERE条件必须匹配您的查询才能使索引有用!

      The WHERE conditions have to match your query for the index to be useful!

      我假设您已经具有主键和相关外键上的索引.

      I assume that you already have primary keys and indexes on relevant foreign keys.

      进一步:

      CREATE INDEX dir_memberships_u_id_role_idx ON dir_memberships (user_id, role)
      WHERE  deleted = FALSE;
      

      为什么第二次user_id?参见:

      • Working of indexes in PostgreSQL
      • Is a composite index also good for queries on the first field?

      此外,由于user_id已在另一个索引中使用,因此您不会阻止 HOT更新(只能与不涉及任何索引的列一起使用.

      Also, since user_id is already used in another index you are not blocking HOT-updates (which can only be used with columns not involved in any indexes.

      为什么role?
      我假设这两列都是integer类型(4个字节).我已经在您的详细问题中看到,您运行的是64位操作系统,其中 MAXALIGN 8个字节,因此另一个整数将根本没有使指数增长.我加入了role,这可能对第二个EXISTS半联接有用.

      Why role?
      I assume both columns are of type integer (4 bytes). I have seen in your detailed question, that you run a 64 bit OS where MAXALIGN 8 bytes, so another integer will not make the index grow at all. I threw in role which might be useful for the second EXISTS semi-join.

      如果您有很多死角",用户,这可能也有帮助:

      If you have many "dead" users, this might also help:

      CREATE INDEX dir_users_id_idx ON dir_users (id)
      WHERE status = 'active' AND deleted = FALSE;
      

      与往常一样,使用EXPLAIN检查以查看索引是否真正被使用.您不会希望无用的索引消耗资源.

      As always, check with EXPLAIN to see whether the indexes actually get used. You wouldn't want useless indexes consuming resources.

      我们斋戒了吗?

      当然,关于性能优化的所有常规建议也都适用.

      Of course, all the usual advice for performance optimization applies, too.

      这篇关于生产中的PostgreSQL查询速度慢-帮助我理解此解释分析输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆