按 ID 删除数百万行的最佳方法 [英] Best way to delete millions of rows by ID

查看:35
本文介绍了按 ID 删除数百万行的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从我的 PG 数据库中删除大约 200 万行.我有一个需要删除的 ID 列表.但是,我尝试这样做的任何方式都需要几天时间.

I need to delete about 2 million rows from my PG database. I have a list of IDs that I need to delete. However, any way I try to do this is taking days.

我尝试将它们放在一个表中,并以 100 为一组进行.4 天后,这仍然在运行,仅删除了 2972​​68 行.(我必须从 ID 表中选择 100 个 ID,删除该列表中的 where,从 ids 表中删除我选择的 100 个).

I tried putting them in a table and doing it in batches of 100. 4 days later, this is still running with only 297268 rows deleted. (I had to select 100 id's from an ID table, delete where IN that list, delete from ids table the 100 I selected).

我试过了:

DELETE FROM tbl WHERE id IN (select * from ids)

这也需要永远.很难估计多长时间,因为我在完成之前看不到它的进度,但查询在 2 天后仍在运行.

That's taking forever, too. Hard to gauge how long, since I can't see it's progress till done, but the query was still running after 2 days.

当我知道要删除的特定 ID 并且有数百万个 ID 时,只是在寻找从表中删除的最有效方法.

Just kind of looking for the most effective way to delete from a table when I know the specific ID's to delete, and there are millions of IDs.

推荐答案

这一切都取决于...

  • 假设没有对相关表的并发写入访问,或者您可能必须以独占方式锁定表,或者这条路线可能根本不适合您.

  • Assuming no concurrent write access to involved tables or you may have to lock tables exclusively or this route may not be for you at all.

删除所有索引(可能除了删除本身所需的索引).
之后重新创建它们.这通常比对索引的增量更新快得多.

Delete all indexes (possibly except the ones needed for the delete itself).
Recreate them afterwards. That's typically much faster than incremental updates to indexes.

检查您是否有可以安全删除/暂时禁用的触发器.

Check if you have triggers that can safely be deleted / disabled temporarily.

外键是否引用您的表?它们可以被删除吗?暂时删除?

Do foreign keys reference your table? Can they be deleted? Temporarily deleted?

根据您的 autovacuum 设置,它可能有助于在操作前运行 VACUUM ANALYZE.

Depending on your autovacuum settings it may help to run VACUUM ANALYZE before the operation.

手册相关章节中列出的一些要点填充数据库也可能有用,具体取决于您的设置.

Some of the points listed in the related chapter of the manual Populating a Database may also be of use, depending on your setup.

如果您删除表的大部分内容,而其余部分适合 RAM,那么最快和最简单的方法可能是:

If you delete large portions of the table and the rest fits into RAM, the fastest and easiest way may be this:

BEGIN; -- typically faster and safer wrapped in a single transaction

SET LOCAL temp_buffers = '1000MB'; -- enough to hold the temp table

CREATE TEMP TABLE tmp AS
SELECT t.*
FROM   tbl t
LEFT   JOIN del_list d USING (id)
WHERE  d.id IS NULL;      -- copy surviving rows into temporary table

TRUNCATE tbl;             -- empty table - truncate is very fast for big tables

INSERT INTO tbl
SELECT * FROM tmp;        -- insert back surviving rows.
-- ORDER BY ?             -- optionally order favorably while being at it

COMMIT;

这样您就不必重新创建视图、外键或其他依赖对象.你会得到一张没有膨胀的原始(排序)表.

This way you don't have to recreate views, foreign keys or other depending objects. And you get a pristine (sorted) table without bloat.

阅读手册中的 temp_buffers 设置.只要表适合内存,或者至少适合大部分内存,这种方法就会很快.如果您的服务器在此操作过程中崩溃,事务包装器将防止丢失数据.

Read about the temp_buffers setting in the manual. This method is fast as long as the table fits into memory, or at least most of it. The transaction wrapper defends against losing data if your server crashes in the middle of this operation.

之后运行 VACUUM ANALYZE.或者 VACUUM FULL ANALYZE 如果你想把它缩小到最小尺寸(需要排他锁).对于大表,请考虑替代 CLUSTER/pg_repack 或类似的:

Run VACUUM ANALYZE afterwards. Or VACUUM FULL ANALYZE if you want to bring it to minimum size (takes exclusive lock). For big tables consider the alternatives CLUSTER / pg_repack or similar:

对于小表,一个简单的 DELETE 而不是 TRUNCATE 通常更快:

For small tables, a simple DELETE instead of TRUNCATE is often faster:

DELETE FROM tbl t
USING  del_list d
WHERE  t.id = d.id;

阅读TRUNCATE的>注释部分.特别是(如 Pedro 也在他的评论中指出):

Read the Notes section for TRUNCATE in the manual. In particular (as Pedro also pointed out in his comment):

TRUNCATE 不能用于有外键引用的表从其他表,除非所有这些表也被截断相同的命令.[...]

TRUNCATE cannot be used on a table that has foreign-key references from other tables, unless all such tables are also truncated in the same command. [...]

还有:

TRUNCATE 不会触发任何可能存在的 ON DELETE 触发器表.

TRUNCATE will not fire any ON DELETE triggers that might exist for the tables.

这篇关于按 ID 删除数百万行的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆