postgresql / vacuum中大量活动/死元组不起作用 [英] High number of live/dead tuples in postgresql/ Vacuum not working

查看:466
本文介绍了postgresql / vacuum中大量活动/死元组不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一个表,其中有200行。但是显示出来的活动元组的数量更多(约60K)。

  selector(*)来自subscriber_offset_manager; 
计数
-------
200
(1行)


选择schemaname,relname,n_live_tup,n_dead_tup FROM pg_stat_user_tables其中relname ='subscriber_offset_manager'ORDER BY n_dead_tup
;
schemaname | relname | n_live_tup | n_dead_tup
------------ + --------------------------- + ---- -------- + ------------
公共| subscription_offset_manager | 61453 | 5
(1行)

但是从pg_stat_activity和pg_locks来看,我们无法跟踪任何打开的连接。

 选择查询,状态,锁类型,模式
FROM pg_locks
JOIN pg_stat_activity
使用(pid)
WHERE关系:: regclass ='subscriber_offset_manager':: regclass
;
查询|州|锁型|模式
------- + ------- + ---------- + ------
(0行)

我也在这张桌子上尝试了全真空,结果如下:




  • 所有时间都没有删除行

  • 有时,所有活动元组都变为死元组。



此处输出。

 真空全动词分析subscriber_offset_manager; 
信息:正在清除 public.subscriber_offset_manager
信息: subscriber_offset_manager:在714页中找到0个可移动的67920不可移动行版本
详细信息:67720死行版本尚未删除。
CPU 0.01s / 0.06u秒过去了0.13秒。
INFO:分析 public.subscriber_offset_manager
INFO: subscriber_offset_manager:扫描710页中的710页,其中包含200条活动行和67720条无效行;样本中有200行,估计有200行。
VACUUM

从pg_stat_user_tables中选择schemaname,relname,n_live_tup,n_dead_tup,其中relname ='subscriber_offset_manager'ORDER BY n_dead_tup
;
schemaname | relname | n_live_tup | n_dead_tup
------------ + --------------------------- + ---- -------- + ------------
公共| subscription_offset_manager | 200 | 67749

10秒后

 从pg_stat_user_tables中选择模式名,relname,n_live_tup,n_dead_tup,其中relname ='subscriber_offset_manager'ORDER BY n_dead_tup 
;
schemaname | relname | n_live_tup | n_dead_tup
------------ + --------------------------- + ---- -------- + ------------
公共| subscription_offset_manager | 68325 | 132

我们的应用程序如何查询此表。




  • 我们的应用程序通常选择一些行,并根据一些业务计算来更新该行。



    选择查询-根据某些ID进行选择



    select * from Subscriber_offset_manager其中shard_id = 1;



    更新查询-为该选定的分片ID更新其他列


  • 大约有20个线程并行执行此操作,而一个线程仅在一行上工作。


  • app是用Java编写的,我们正在使用hibernate来执行db操作。

  • PostgreSQL版本是9.3.24



另一个有趣的发现:
-当我停止我的Java应用程序,然后完全吸尘时,它工作正常(行数和活动元组变为相等)。因此,如果我们从Java应用程序中连续选择并更新,则会出问题。 –



问题/问题



这些活动元组有时会变成死元组一段时间后又复活了。



由于上述行为,请从表中进行选择,这会花费一些时间并增加服务器的负载,因为那里有很多实时数据/重复数据..

VACUUM 发挥作用:




  • 长时间运行的事务。


  • 未提交的准备好的事务。


  • 陈旧的复制插槽。




请参见我的博客文章以获取详细信息。


There is a table , which has 200 rows . But number of live tuples showing there is more than that (around 60K) .

select count(*) from subscriber_offset_manager;
 count 
-------
   200
(1 row)


 SELECT schemaname,relname,n_live_tup,n_dead_tup FROM pg_stat_user_tables  where relname='subscriber_offset_manager' ORDER BY n_dead_tup
;
 schemaname |          relname          | n_live_tup | n_dead_tup 
------------+---------------------------+------------+------------
 public     | subscriber_offset_manager |      61453 |          5
(1 row)

But as seen from pg_stat_activity and pg_locks , we are not able to track any open connection .

SELECT query, state,locktype,mode
FROM pg_locks
JOIN pg_stat_activity
  USING (pid)
WHERE relation::regclass = 'subscriber_offset_manager'::regclass
  ;
 query | state | locktype | mode 
-------+-------+----------+------
(0 rows)

I also tried full vacuum on this table , Below are results :

  • All the times no rows are removed
  • some times all the live tuples become dead tuples .

Here is output .

vacuum FULL VERBOSE ANALYZE subscriber_offset_manager;
INFO:  vacuuming "public.subscriber_offset_manager"
INFO:  "subscriber_offset_manager": found 0 removable, 67920 nonremovable row versions in 714 pages
DETAIL:  67720 dead row versions cannot be removed yet.
CPU 0.01s/0.06u sec elapsed 0.13 sec.
INFO:  analyzing "public.subscriber_offset_manager"
INFO:  "subscriber_offset_manager": scanned 710 of 710 pages, containing 200 live rows and 67720 dead rows; 200 rows in sample, 200 estimated total rows
VACUUM

 SELECT schemaname,relname,n_live_tup,n_dead_tup FROM pg_stat_user_tables  where relname='subscriber_offset_manager' ORDER BY n_dead_tup
;
 schemaname |          relname          | n_live_tup | n_dead_tup 
------------+---------------------------+------------+------------
 public     | subscriber_offset_manager |        200 |      67749

and after 10 sec

SELECT schemaname,relname,n_live_tup,n_dead_tup FROM pg_stat_user_tables  where relname='subscriber_offset_manager' ORDER BY n_dead_tup
;
 schemaname |          relname          | n_live_tup | n_dead_tup 
------------+---------------------------+------------+------------
 public     | subscriber_offset_manager |      68325 |        132

How Our App query to this table .

  • Our application generally select some rows and based on some business calculation, update the row .

    select query -- select based on some id

    select * from subscriber_offset_manager where shard_id=1 ;

    update query -- update some other column for this selected shard id

  • around 20 threads do this in parallel and One thread works on only one row .

  • app is writen in java and we are using hibernate to do db operations .
  • Postgresql version is 9.3.24

One more interesting observation : - when i stop my java app and then do full vacuum , it works fine (number of rows and live tuples become equal). So there is something wrong if we select and update continuously from java app . –

Problem/Issue

These live tuples some times go to dead tuples and after some times again comes to live .

Due to above behaviour select from the table taking time and increasing load on server as lots of live/deadtuples are there ..

解决方案

I know three things that keep VACUUM from doing its job:

  • Long running transactions.

  • Prepared transactions that did not get committed.

  • Stale replication slots.

See my blog post for details.

这篇关于postgresql / vacuum中大量活动/死元组不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆