Postgresql 从存在的地方缓慢删除 [英] Postgresql slow delete from where exists
问题描述
我在处理慢速删除查询时遇到问题.我有一个架构,比如说目标",其中包含的表在另一个表中都有一个等效的表(相同的列和主键),比如说增量".我现在想从目标模式中删除出现在增量模式中的所有行.我已经尝试使用 DELETE FROM WHERE EXISTS 方法进行此操作,但这似乎非常慢.这是一个示例查询:
I'm having trouble with slow delete queries. I have a schema ,say "target" containing tables that all have an equivalent table (identical columns & primary keys) in another one, say "delta". I now want to delete all rows that appear in the delta schema from the target schema. I have tried this using the DELETE FROM WHERE EXISTS approach, but that seems incredibly slow. Here's an example query:
DELETE FROM "target".name2phoneme
WHERE EXISTS(
SELECT 1 FROM delta.name2phoneme d
WHERE name2phoneme.NAME_ID = d.NAME_ID
AND name2phoneme.PHONEME_ID = d.PHONEME_ID
);
这是两个表的布局(除了delta"模式只有主键没有外键)
This is the layout of both tables (whith the exception that the "delta" schema only has primary keys and no foreign keys)
CREATE TABLE name2phoneme
(
name_id uuid NOT NULL,
phoneme_id uuid NOT NULL,
seq_num numeric(3,0),
CONSTRAINT pk_name2phoneme PRIMARY KEY (name_id, phoneme_id),
CONSTRAINT fk_name2phoneme_name_id_2_name FOREIGN KEY (name_id)
REFERENCES name (name_id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
DEFERRABLE INITIALLY DEFERRED,
CONSTRAINT fk_name2phoneme_phoneme_id_2_phoneme FOREIGN KEY (phoneme_id)
REFERENCES phoneme (phoneme_id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
DEFERRABLE INITIALLY DEFERRED
)
目标"表最初包含超过 1800 万行,而增量表包含大约 370 万行(要从目标中删除).
The "target" table originally contains a little over 18M rows, while the delta table contains about 3.7M rows (that are to be deleted from the target).
这是上述查询的 EXPLAIN 的输出:
Here's the output of EXPLAIN of the above query:
"Delete on name2phoneme (cost=154858.03..1068580.46 rows=6449114 width=12)"
" -> Hash Join (cost=154858.03..1068580.46 rows=6449114 width=12)"
" Hash Cond: ((name2phoneme.name_id = d.name_id) AND (name2phoneme.phoneme_id = d.phoneme_id))"
" -> Seq Scan on name2phoneme (cost=0.00..331148.16 rows=18062616 width=38)"
" -> Hash (cost=69000.01..69000.01 rows=3763601 width=38)"
" -> Seq Scan on name2phoneme d (cost=0.00..69000.01 rows=3763601 width=38)"
我试图解释分析上述查询,但执行时间超过 2 小时,所以我将其终止.
I tried to EXPLAIN ANALYZE the above query, but execution took over 2hrs so I killed it.
关于如何优化此操作的任何想法?
Any ideas on how I can optimize this operation?
推荐答案
删除 370 万行非常耗时,因为查找每一行然后记录和删除行会产生开销.仅考虑所有脏页、日志记录和缓存未命中就令人难以置信——更不用说对索引的更新了.
Deleting 3.7 million rows is very time consuming, because of the overhead of looking up each row and then logging and deleting the rows. Just thinking about all the dirty pages, logging, and cache misses is mind-boggling -- not to mention updates to the indexes as well.
出于这个原因,这样的事情可以更快:
For that reason, something like this can be much faster:
create temporary table temp_n2p as
select n2p.*
from "target".name2phoneme n2p
where not exists (select 1
from delta.name2phoneme d
where n2p.NAME_ID = d.NAME_ID and
n2p.PHONEME_ID = d.PHONEME_ID
);
truncate table "target".name2phoneme;
insert into "target".name2phoneme
select *
from temp_n2p;
您还应该在截断之前删除索引,然后在之后重新创建它们.
You should also drop the indexes before the truncation and then recreate them afterwards.
这篇关于Postgresql 从存在的地方缓慢删除的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!