按ID删除数百万行的最佳方法 [英] Best way to delete millions of rows by ID

查看:94
本文介绍了按ID删除数百万行的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从PG数据库中删除大约200万行.我有一个需要删除的ID列表.但是,我尝试执行此操作的任何方法都需要花费几天的时间.

I need to delete about 2 million rows from my PG database. I have a list of IDs that I need to delete. However, any way I try to do this is taking days.

我尝试将它们放在表中并以100为批次进行处理.4天后,它仍在运行,仅删除了297268行. (我必须从ID表中选择100个ID,删除该列表中的位置,从ID表中删除我选择的100个ID.)

I tried putting them in a table and doing it in batches of 100. 4 days later, this is still running with only 297268 rows deleted. (I had to select 100 id's from an ID table, delete where IN that list, delete from ids table the 100 I selected).

我尝试过:

DELETE FROM tbl WHERE id IN (select * from ids)

那也是永远的.很难估算需要多长时间,因为直到完成为止我看不到进度,但是查询仍在2天后仍在运行.

That's taking forever, too. Hard to gauge how long, since I can't see it's progress till done, but the query was still running after 2 days.

当我知道要删除的特定ID并且有数百万个ID时,就只是在寻找一种最有效的从表中删除的方法.

Just kind of looking for the most effective way to delete from a table when I know the specific ID's to delete, and there are millions of IDs.

推荐答案

这完全取决于...

  • 删除所有索引(删除ID所需要的索引除外)
    之后重新创建它们(=比索引的增量更新快得多)

  • Delete all indexes (except the one on the ID which you need for the delete)
    Recreate them afterwards (= much faster than incremental updates to indexes)

检查触发器是否可以安全地暂时删除/禁用

Check if you have triggers that can safely be deleted / disabled temporarily

外键是否引用您的表?可以删除它们吗?暂时删除了吗?

Do foreign keys reference your table? Can they be deleted? Temporarily deleted?

根据您的自动真空设置,可能有助于在操作前运行VACUUM ANALYZE.

Depending on your autovacuum settings it may help to run VACUUM ANALYZE before the operation.

假设对相关表的 没有并发写访问权限 ,或者您可能必须专门锁定表,否则此路由可能根本不适合您.

Assuming no concurrent write access to involved tables or you may have to lock tables exclusively or this route may not be for you at all.

手册 填充数据库 也可能有用,具体取决于您的设置.

Some of the points listed in the related chapter of the manual Populating a Database may also be of use, depending on your setup.

如果您删除表的大部分,而其余部分都适合RAM,则最快,最简单的方法是:

If you delete large portions of the table and the rest fits into RAM, the fastest and easiest way would be this:

SET temp_buffers = '1000MB'; -- or whatever you can spare temporarily

CREATE TEMP TABLE tmp AS
SELECT t.*
FROM   tbl t
LEFT   JOIN del_list d USING (id)
WHERE  d.id IS NULL;      -- copy surviving rows into temporary table

TRUNCATE tbl;             -- empty table - truncate is very fast for big tables

INSERT INTO tbl
SELECT * FROM tmp;        -- insert back surviving rows.

这样,您不必重新创建视图,外键或其他依赖的对象. 阅读有关 temp_buffers设置的信息在手册中.只要表适合内存,或者至少适合大多数内存,此方法就会很快.请注意,如果服务器在此操作过程中崩溃,则可能会丢失数据.您可以将所有内容包装到一个事务中以使其更安全.

This way you don't have to recreate views, foreign keys or other depending objects. Read about the temp_buffers setting in the manual. This method is fast as long as the table fits into memory, or at least most of it. Be aware that you can lose data if your server crashes in the middle of this operation. You can wrap all of it into a transaction to make it safer.

随后运行ANALYZE.如果未采用截断路径,则为VACUUM ANALYZE;如果要将其减小到最小大小,则为VACUUM FULL ANALYZE.对于大表,请考虑其他选择CLUSTER/pg_repack:

Run ANALYZE afterwards. Or VACUUM ANALYZE if you did not go the truncate route, or VACUUM FULL ANALYZE if you want to bring it to minimum size. For big tables consider the alternatives CLUSTER / pg_repack:

对于小型表,简单的DELETE而不是TRUNCATE通常更快:

For small tables, a simple DELETE instead of TRUNCATE is often faster:

DELETE FROM tbl t
USING  del_list d
WHERE  t.id = d.id;

阅读 Pedro也在他的评论中指出):

TRUNCATE不能用于具有外键引用的表 从其他表中删除,除非所有此类表在 相同的命令. [...]

TRUNCATE cannot be used on a table that has foreign-key references from other tables, unless all such tables are also truncated in the same command. [...]

并且:

TRUNCATE不会触发任何可能存在的ON DELETE触发器 桌子.

TRUNCATE will not fire any ON DELETE triggers that might exist for the tables.

这篇关于按ID删除数百万行的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆