“替换插入”的最佳磁盘节省策略 [英] best disk saving strategy for "replacement inserts"

查看:96
本文介绍了“替换插入”的最佳磁盘节省策略的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

每天我都会从一个大表中删除成千上万条记录,然后进行一些计算(使用新数据)并替换以前删除的每条记录。我认为定期进行 vacuum tbl 可以解决问题。我知道它不会将磁盘空间返回给服务器,但是(由于pg docs),我认为由于插入的记录数量与删除的记录数量相同,因此我不会丢失任何磁盘空间。但是,将表移至其他名称空间后(出于不相关的原因),表从117GB变为了44GB!所以...

Every day I delete hundreds of thousands of records from a large table, then I do some calculations (with new data) and replace every one of the records that I previously deleted. I thought doing regular vacuum tbl would do the trick. I know it doesn't return disk space to the server, but (because of the pg docs) I thought because I was inserting about as many records as I was deleting, I wouldn't loose any/much disk space. However, after moving the table to a different namespace (for an unrelated reason) the table went from 117GB to 44GB! So...

有没有比这更好的策略,所以我的桌子确实肿了:

Is there a better strategy than this so my table does bloat:

delete from tbl where ...etc... -- hundreds of thousands of rows removed
insert into tbl (...etc...) values (...etc...) -- hundreds of thousands of rows added back (fresh calcs)

.. repeat the above about 10 times a day ...

vacuum tbl

https://www.postgresql.org/docs/9.6/static/sql-vacuum.html

PostgreSQL 9.6

PostgreSQL 9.6

我实际上在减小表大小的方法是在这里:
整数超出范围,剩余磁盘空间太小,无法将ID转换为bigint和其他解决方案

What I actually did to reduce the table size is in my answer here: integer out of range and remaining disk space too small to convert id to bigint and other solutions

编辑1:
v的缺点完全对我来说太难了。我正在处理24/7的东西,所以我不能拥有这样的锁,而且我的可用磁盘空间在任何时间点都非常有限。试图以更好的方式解决这个问题。

Edit 1: The drawbacks to vacuum full are too restricting for me. I am processing stuff 24/7 so i can't have locks like that and my available disk space is pretty limited at any point in time. Trying to go about this in a better way.

推荐答案

您正在寻找的是死空间平衡,就像我喜欢的那样称它为。如果您说100万行,并且要删除和替换10万行,则可以采用其他方法。假设您删除100k,然后立即插入100k。数据库将没有时间清理那些旧的死行,因此现在您的1M行表中有10万行死行。在接下来的24小时内,真空将开始,并将它们标记为无效,并且在下次删除/插入时,您将再创建100k无效行,然后重用(大部分)以前的100k无效行。现在,您的1M行表再次有约10万行死行,下次将重用,依此类推。

What you are looking for is "dead space equilibrium" as I like to call it. If you've got say 1M rows and you want to delete and replace 100k rows, then you can do it in different ways. Let's suppose you delete 100k, and insert 100k right away. The db won't have time to vacuum up those old dead rows, so now your 1M row table has 100k dead rows in it. Over the next 24 hours vacuum will kick in and mark them dead, and the next time you delete / insert, you'll create 100k more dead rows, then reuse (most of) the previous 100k dead rows. Your 1M row table now has ~100k dead rows again, which will get reused next time and so on.

您要达到删除/插入位置(或更新)和真空以均匀的速率创建/回收死元组。

You want to reach a point where your deletes/inserts (or updates) and vacuum are creating / reclaiming dead tuples at an even rate.

这篇关于“替换插入”的最佳磁盘节省策略的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆