即使增加了 work_mem 大小,性能也没有提高 [英] Performance is not increased even increased the work_mem size

查看:58
本文介绍了即使增加了 work_mem 大小,性能也没有提高的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我返回一个查询,执行平均需要 170 秒.我浏览了 PSQL 文档,他们提到如果我们增加 work_mem 性能会提高.即使性能没有提高,我也将 work_mem 增加到 1000 MB.

I return a query which is taking Avg of 170 seconds to execute. I went through the PSQL documentation, they mentioned that if we increase work_mem the performance will increase. I increased the work_mem to 1000 MB even the performance is not improved.

注意:我索引了所有属于查询的字段.

Note: I indexed all the field which are the part of the query.

下面我粘贴了数据库中存在的记录、查询执行计划、查询、结果.

Below I am pasting the records present in the DB, query execution plan, query, result.

  • 数据库中存在的记录数:
event_logs=> select count(*) from events;
  count   
----------
 18706734
(1 row)

  • 查询:
  • select raw->'request_payload'->'source'->0 as file, 
           count(raw->'request_payload'->>'status') as count, 
           raw->'request_payload'->>'status' as status 
    from events 
    where client = 'NTT' 
      and to_char(datetime, 'YYYY-MM-DD') = '2019-10-31' 
      and event_name = 'wbs_indexing' 
    group by raw->'request_payload'->'source'->0, 
             raw->'request_payload'->>'status';
    

    • 结果:
    •  file                   | count  | status  
      -----------------------------+--------+--
       "xyz.csv"              |  91878 | failure
       "abc.csv"              |  91816 | failure
       "efg.csv"              | 398196 | failure
      (3 rows)
      
      

      • 默认 work_mem(4 MB) 查询执行计划:
      • event_logs=> SHOW work_mem;
         work_mem 
        ----------
         4MB
        (1 row)
        
        event_logs=> explain analyze select raw->'request_payload'->'source'->0 as file, count(raw->'request_payload'->>'status') as count,  raw->'request_payload'->>'status' as status from events where to_char(datetime, 'YYYY-MM-DD') = '2019-10-31' and client = 'NTT'  and event_name = 'wbs_indexing' group by raw->'request_payload'->'source'->0, raw->'request_payload'->>'status';
                                                                                     QUERY PLAN                                                       
        
        ----------------------------------------------------------------------------------------------------------------------------------------------
        -----------------------
         Finalize GroupAggregate  (cost=3256017.54..3267087.56 rows=78474 width=72) (actual time=172547.598..172965.581 rows=3 loops=1)
           Group Key: ((((raw -> 'request_payload'::text) -> 'source'::text) -> 0)), (((raw -> 'request_payload'::text) ->> 'status'::text))
           ->  Gather Merge  (cost=3256017.54..3264829.34 rows=65674 width=72) (actual time=172295.204..172965.630 rows=9 loops=1)
                 Workers Planned: 2
                 Workers Launched: 2
                 ->  Partial GroupAggregate  (cost=3255017.52..3256248.91 rows=32837 width=72) (actual time=172258.342..172737.534 rows=3 loops=3)
                       Group Key: ((((raw -> 'request_payload'::text) -> 'source'::text) -> 0)), (((raw -> 'request_payload'::text) ->> 'status'::text))
                       ->  Sort  (cost=3255017.52..3255099.61 rows=32837 width=533) (actual time=171794.584..172639.670 rows=193963 loops=3)
                             Sort Key: ((((raw -> 'request_payload'::text) -> 'source'::text) -> 0)), (((raw -> 'request_payload'::text) ->> 'status'::text))
                             Sort Method: external merge  Disk: 131856kB
                             ->  Parallel Seq Scan on events  (cost=0.00..3244696.75 rows=32837 width=533) (actual time=98846.155..169311.063 rows=193963 loops=3)
                                   Filter: ((client = 'NTT'::text) AND (event_name = 'wbs_indexing'::text) AND (to_char(datetime, 'YYYY-MM-DD'::text) = '2019-10-31'::text))
                                   Rows Removed by Filter: 6041677
         Planning time: 0.953 ms
         Execution time: 172983.273 ms
        (15 rows)
        
        

        • 增加了 work_mem(1000 MB) 查询执行计划:
        • event_logs=> SHOW work_mem;
           work_mem 
          ----------
           1000MB
          (1 row)
          
          event_logs=> explain analyze select raw->'request_payload'->'source'->0 as file, count(raw->'request_payload'->>'status') as count,  raw->'request_payload'->>'status' as status from events where to_char(datetime, 'YYYY-MM-DD') = '2019-10-31' and client = 'NTT'  and event_name = 'wbs_indexing' group by raw->'request_payload'->'source'->0, raw->'request_payload'->>'status';
                                                                                      QUERY PLAN                                                                              
          ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
           Finalize GroupAggregate  (cost=3248160.04..3259230.06 rows=78474 width=72) (actual time=167979.419..168189.228 rows=3 loops=1)
             Group Key: ((((raw -> 'request_payload'::text) -> 'source'::text) -> 0)), (((raw -> 'request_payload'::text) ->> 'status'::text))
             ->  Gather Merge  (cost=3248160.04..3256971.84 rows=65674 width=72) (actual time=167949.951..168189.282 rows=9 loops=1)
                   Workers Planned: 2
                   Workers Launched: 2
                   ->  Partial GroupAggregate  (cost=3247160.02..3248391.41 rows=32837 width=72) (actual time=167945.607..168083.707 rows=3 loops=3)
                         Group Key: ((((raw -> 'request_payload'::text) -> 'source'::text) -> 0)), (((raw -> 'request_payload'::text) ->> 'status'::text))
                         ->  Sort  (cost=3247160.02..3247242.11 rows=32837 width=533) (actual time=167917.891..167975.549 rows=193963 loops=3)
                               Sort Key: ((((raw -> 'request_payload'::text) -> 'source'::text) -> 0)), (((raw -> 'request_payload'::text) ->> 'status'::text))
                               Sort Method: quicksort  Memory: 191822kB
                               ->  Parallel Seq Scan on events  (cost=0.00..3244696.75 rows=32837 width=533) (actual time=98849.936..167570.669 rows=193963 loops=3)
                                     Filter: ((client = 'NTT'::text) AND (event_name = 'wbs_indexing'::text) AND (to_char(datetime, 'YYYY-MM-DD'::text) = '2019-10-31'::text))
                                     Rows Removed by Filter: 6041677
           Planning time: 0.238 ms
           Execution time: 168199.046 ms
          (15 rows)
          
          

          • 有人可以帮助我改进此查询的性能吗?
          • 推荐答案

            增加 work_mem 似乎使排序速度提高了大约 8 倍:(172639.670 - 169311.063)/(167975.549 - 167570.669) .但是由于排序只占用了整个执行时间的一小部分,所以即使快 1000 倍也不能使事情总体上变得更好.占用时间的是seq扫描.

            Increasing the work_mem did seem to make the sort about 8 times faster: (172639.670 - 169311.063) / (167975.549 - 167570.669) . But since the sort only took up a small fraction of the overall execution time, making it even 1000 times faster can't make things much better overall. It is the seq scan which is taking up the time.

            seq 扫描中的大部分时间可能都花在了 IO 上.您可以在打开 track_io_timing 后运行 EXPLAIN (ANALYZE, BUFFERS) 来查看.

            Much of the time in the seq scan is probably spend on IO. You can see by running EXPLAIN (ANALYZE, BUFFERS) after turning track_io_timing on.

            此外,并行化 seq 扫描通常不是很有帮助,因为 IO 系统通常能够将其全部容量提供给单个读取器,这是由于预读的魔力.有时并行阅读器甚至可以互相踩到脚趾,使整个性能变得更糟.您可以使用 set max_parallel_workers_per_gather TO 0; 禁用并行化,这可能会使事情变得更快,如果不这样做,至少会使 EXPLAIN 计划更易于理解.

            Also, parallelizing a seq scan is often not very helpful, as the IO system is usually able to deliver its full capacity to a single reader, due to the magic of readahead. And sometimes parallel readers can even step on each others toes, making the whole performance worse. You can disable parallelization with set max_parallel_workers_per_gather TO 0; This might make things faster, and if it doesn't it will at least make the EXPLAIN plan easier to understand.

            您正在获取超过 3% 的表:193963/(193963 + 6041677).当您获取如此多的索引时,索引可能不是很有帮助.如果是这样,您需要一个组合索引,而不是单个索引.因此,您需要 (client, event_name, date(datetime)) 上的索引.然后您还需要更改查询以使用 date(datetime) 而不是 to_char(datetime, 'YYYY-MM-DD').您需要进行此更改,因为 to_char 不是不可变的,因此无法编制索引.

            You are fetching over 3% of the table: 193963 / (193963 + 6041677). Indexes might not be very helpful when you are fetching so much of it. If they are to be, you would want a combined index, not individual ones. So you would want an index on (client, event_name, date(datetime)). Then you would also need to change the query to use date(datetime) rather than to_char(datetime, 'YYYY-MM-DD'). You need to make this change because to_char is not immutable, and so can't be indexed.

            这篇关于即使增加了 work_mem 大小,性能也没有提高的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆