postgresql / vacuum中大量活动/死元组不起作用 [英] High number of live/dead tuples in postgresql/ Vacuum not working
问题描述
有一个表,其中有200行。但是显示出来的活动元组的数量更多(约60K)。
selector(*)来自subscriber_offset_manager;
计数
-------
200
(1行)
选择schemaname,relname,n_live_tup,n_dead_tup FROM pg_stat_user_tables其中relname ='subscriber_offset_manager'ORDER BY n_dead_tup
;
schemaname | relname | n_live_tup | n_dead_tup
------------ + --------------------------- + ---- -------- + ------------
公共| subscription_offset_manager | 61453 | 5
(1行)
但是从pg_stat_activity和pg_locks来看,我们无法跟踪任何打开的连接。
选择查询,状态,锁类型,模式
FROM pg_locks
JOIN pg_stat_activity
使用(pid)
WHERE关系:: regclass ='subscriber_offset_manager':: regclass
;
查询|州|锁型|模式
------- + ------- + ---------- + ------
(0行)
我也在这张桌子上尝试了全真空,结果如下:
- 所有时间都没有删除行
- 有时,所有活动元组都变为死元组。
此处输出。
真空全动词分析subscriber_offset_manager;
信息:正在清除 public.subscriber_offset_manager
信息: subscriber_offset_manager:在714页中找到0个可移动的67920不可移动行版本
详细信息:67720死行版本尚未删除。
CPU 0.01s / 0.06u秒过去了0.13秒。
INFO:分析 public.subscriber_offset_manager
INFO: subscriber_offset_manager:扫描710页中的710页,其中包含200条活动行和67720条无效行;样本中有200行,估计有200行。
VACUUM
从pg_stat_user_tables中选择schemaname,relname,n_live_tup,n_dead_tup,其中relname ='subscriber_offset_manager'ORDER BY n_dead_tup
;
schemaname | relname | n_live_tup | n_dead_tup
------------ + --------------------------- + ---- -------- + ------------
公共| subscription_offset_manager | 200 | 67749
10秒后
从pg_stat_user_tables中选择模式名,relname,n_live_tup,n_dead_tup,其中relname ='subscriber_offset_manager'ORDER BY n_dead_tup
;
schemaname | relname | n_live_tup | n_dead_tup
------------ + --------------------------- + ---- -------- + ------------
公共| subscription_offset_manager | 68325 | 132
我们的应用程序如何查询此表。
-
我们的应用程序通常选择一些行,并根据一些业务计算来更新该行。
选择查询-根据某些ID进行选择
select * from Subscriber_offset_manager其中shard_id = 1;
更新查询-为该选定的分片ID更新其他列
-
大约有20个线程并行执行此操作,而一个线程仅在一行上工作。
- app是用Java编写的,我们正在使用hibernate来执行db操作。
- PostgreSQL版本是9.3.24
另一个有趣的发现:
-当我停止我的Java应用程序,然后完全吸尘时,它工作正常(行数和活动元组变为相等)。因此,如果我们从Java应用程序中连续选择并更新,则会出问题。 –
问题/问题
这些活动元组有时会变成死元组一段时间后又复活了。
由于上述行为,请从表中进行选择,这会花费一些时间并增加服务器的负载,因为那里有很多实时数据/重复数据..
-
长时间运行的事务。
-
未提交的准备好的事务。
-
陈旧的复制插槽。
请参见我的博客文章以获取详细信息。
There is a table , which has 200 rows . But number of live tuples showing there is more than that (around 60K) .
select count(*) from subscriber_offset_manager;
count
-------
200
(1 row)
SELECT schemaname,relname,n_live_tup,n_dead_tup FROM pg_stat_user_tables where relname='subscriber_offset_manager' ORDER BY n_dead_tup
;
schemaname | relname | n_live_tup | n_dead_tup
------------+---------------------------+------------+------------
public | subscriber_offset_manager | 61453 | 5
(1 row)
But as seen from pg_stat_activity and pg_locks , we are not able to track any open connection .
SELECT query, state,locktype,mode
FROM pg_locks
JOIN pg_stat_activity
USING (pid)
WHERE relation::regclass = 'subscriber_offset_manager'::regclass
;
query | state | locktype | mode
-------+-------+----------+------
(0 rows)
I also tried full vacuum on this table , Below are results :
- All the times no rows are removed
- some times all the live tuples become dead tuples .
Here is output .
vacuum FULL VERBOSE ANALYZE subscriber_offset_manager;
INFO: vacuuming "public.subscriber_offset_manager"
INFO: "subscriber_offset_manager": found 0 removable, 67920 nonremovable row versions in 714 pages
DETAIL: 67720 dead row versions cannot be removed yet.
CPU 0.01s/0.06u sec elapsed 0.13 sec.
INFO: analyzing "public.subscriber_offset_manager"
INFO: "subscriber_offset_manager": scanned 710 of 710 pages, containing 200 live rows and 67720 dead rows; 200 rows in sample, 200 estimated total rows
VACUUM
SELECT schemaname,relname,n_live_tup,n_dead_tup FROM pg_stat_user_tables where relname='subscriber_offset_manager' ORDER BY n_dead_tup
;
schemaname | relname | n_live_tup | n_dead_tup
------------+---------------------------+------------+------------
public | subscriber_offset_manager | 200 | 67749
and after 10 sec
SELECT schemaname,relname,n_live_tup,n_dead_tup FROM pg_stat_user_tables where relname='subscriber_offset_manager' ORDER BY n_dead_tup
;
schemaname | relname | n_live_tup | n_dead_tup
------------+---------------------------+------------+------------
public | subscriber_offset_manager | 68325 | 132
How Our App query to this table .
Our application generally select some rows and based on some business calculation, update the row .
select query -- select based on some id
select * from subscriber_offset_manager where shard_id=1 ;
update query -- update some other column for this selected shard id
around 20 threads do this in parallel and One thread works on only one row .
- app is writen in java and we are using hibernate to do db operations .
- Postgresql version is 9.3.24
One more interesting observation : - when i stop my java app and then do full vacuum , it works fine (number of rows and live tuples become equal). So there is something wrong if we select and update continuously from java app . –
Problem/Issue
These live tuples some times go to dead tuples and after some times again comes to live .
Due to above behaviour select from the table taking time and increasing load on server as lots of live/deadtuples are there ..
I know three things that keep VACUUM
from doing its job:
Long running transactions.
Prepared transactions that did not get committed.
Stale replication slots.
See my blog post for details.
这篇关于postgresql / vacuum中大量活动/死元组不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!