如何从频繁访问的表中删除许多行 [英] How to delete many rows from frequently accessed table

查看:109
本文介绍了如何从频繁访问的表中删除许多行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要删除一个非常大的表(例如,5米行)的大多数(比方说,90%)。该表的另外10%经常被读取,但不会被写入。

I need to delete the majority (say, 90%) of a very large table (say, 5m rows). The other 10% of this table is frequently read, but not written to.

来自按ID删除数百万行的最佳方法,我认为我应该删除90%的任何索引。 m删除,以加快进程(除了我用来选择要删除的行的索引)。

From "Best way to delete millions of rows by ID", I gather that I should remove any index on the 90% I'm deleting, to speed up the process (except an index I'm using to select the rows for deletion).

来自PostgreSQL锁定模式,我看到该操作将获得 ROW EXCLUSIVE 上的锁整张桌子。但由于我只是读取其他10%,这应该没关系。

From "PostgreSQL locking mode", I see that this operation will acquire a ROW EXCLUSIVE lock on the entire table. But since I'm only reading the other 10%, this ought not matter.

因此,在一个命令中删除所有内容是否安全(即 DELETE FROM表WHERE delete_flag ='t')?我担心如果删除一行失败,触发巨大的回滚,那么它将影响我从表中读取的能力。批量删除会更明智吗?

So, is it safe to delete everything in one command (i.e. DELETE FROM table WHERE delete_flag='t')? I'm worried that if the deletion of one row fails, triggering an enormous rollback, then it will affect my ability to read from the table. Would it be wiser to delete in batches?

推荐答案


  1. 索引对于操作完全没用所有行的90%。无论哪种方式,顺序扫描都会更快。

  1. Indexes are completely useless for operations on 90% of all rows. Sequential scans will be faster either way.

如果需要允许并发读取,则不能对表进行独占锁定。因此,您也不能删除同一事务中的任何索引。

If you need to allow concurrent reads, you cannot take an exclusive lock on the table. So you can also not drop any indexes in the same transaction.

可以删除单独事务中的索引以保持持续时间独家锁定至少。
后来使用 CREATE INDEX CONCURRENTLY 在后台重建索引 - 只需要一个非常简短的独占锁。

You could drop indexes in separate transactions to keep the duration of the exclusive lock at a minimum. And later use CREATE INDEX CONCURRENTLY to rebuild the index in the background - and only take a very brief exclusive lock.

如果你有一个稳定的条件来确定剩余10%的行,我强烈建议 部分索引 仅针对这两行:

If you have a stable condition to identify the 10 % of rows that stay, I would strongly suggest a partial index on just those rows to get the best for both:


  • 阅读查询可以随时快速访问表格(使用部分索引)。

  • DELETE 根本不会修改部分索引,因为 DELETE 中没有涉及任何行。

  • Reading queries can access the table quickly (using the partial index) at all times.
  • The big DELETE is not going to modify the partial index at all, since none of the rows are involved in the DELETE.

CREATE INDEX foo (some_id) WHERE delete_flag = FALSE;

假设 delete_flag 布尔。您必须在查询中包含相同的谓词(即使它看起来在逻辑上是多余的),以确保Postgres能够理解它可以使用部分索引。

Assuming delete_flag is boolean. You have to include the same predicate in your queries (even if it seems logically redundant) to make sure Postgres understands it can use the partial index.

这篇关于如何从频繁访问的表中删除许多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆