Postgresql 从存在的地方缓慢删除 [英] Postgresql slow delete from where exists

查看:108
本文介绍了Postgresql 从存在的地方缓慢删除的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在处理慢速删除查询时遇到问题.我有一个架构,比如说目标",其中包含的表在另一个表中都有一个等效的表(相同的列和主键),比如说增量".我现在想从目标模式中删除出现在增量模式中的所有行.我已经尝试使用 DELETE FROM WHERE EXISTS 方法进行此操作,但这似乎非常慢.这是一个示例查询:

I'm having trouble with slow delete queries. I have a schema ,say "target" containing tables that all have an equivalent table (identical columns & primary keys) in another one, say "delta". I now want to delete all rows that appear in the delta schema from the target schema. I have tried this using the DELETE FROM WHERE EXISTS approach, but that seems incredibly slow. Here's an example query:

DELETE FROM "target".name2phoneme
WHERE EXISTS(
  SELECT 1 FROM delta.name2phoneme d 
  WHERE name2phoneme.NAME_ID = d.NAME_ID 
  AND name2phoneme.PHONEME_ID = d.PHONEME_ID
);

这是两个表的布局(除了delta"模式只有主键没有外键)

This is the layout of both tables (whith the exception that the "delta" schema only has primary keys and no foreign keys)

CREATE TABLE name2phoneme
(
  name_id uuid NOT NULL,
  phoneme_id uuid NOT NULL,
  seq_num numeric(3,0),
  CONSTRAINT pk_name2phoneme PRIMARY KEY (name_id, phoneme_id),
  CONSTRAINT fk_name2phoneme_name_id_2_name FOREIGN KEY (name_id)
    REFERENCES name (name_id) MATCH SIMPLE
    ON UPDATE NO ACTION
    ON DELETE NO ACTION
    DEFERRABLE INITIALLY DEFERRED,
  CONSTRAINT fk_name2phoneme_phoneme_id_2_phoneme FOREIGN KEY (phoneme_id)
    REFERENCES phoneme (phoneme_id) MATCH SIMPLE
    ON UPDATE NO ACTION
    ON DELETE NO ACTION
    DEFERRABLE INITIALLY DEFERRED
)

目标"表最初包含超过 1800 万行,而增量表包含大约 370 万行(要从目标中删除).

The "target" table originally contains a little over 18M rows, while the delta table contains about 3.7M rows (that are to be deleted from the target).

这是上述查询的 EXPLAIN 的输出:

Here's the output of EXPLAIN of the above query:

"Delete on name2phoneme  (cost=154858.03..1068580.46 rows=6449114 width=12)"
"  ->  Hash Join  (cost=154858.03..1068580.46 rows=6449114 width=12)"
"        Hash Cond: ((name2phoneme.name_id = d.name_id) AND (name2phoneme.phoneme_id = d.phoneme_id))"
"        ->  Seq Scan on name2phoneme  (cost=0.00..331148.16 rows=18062616 width=38)"
"        ->  Hash  (cost=69000.01..69000.01 rows=3763601 width=38)"
"              ->  Seq Scan on name2phoneme d  (cost=0.00..69000.01 rows=3763601 width=38)"

我试图解释分析上述查询,但执行时间超过 2 小时,所以我将其终止.

I tried to EXPLAIN ANALYZE the above query, but execution took over 2hrs so I killed it.

关于如何优化此操作的任何想法?

Any ideas on how I can optimize this operation?

推荐答案

删除 370 万行非常耗时,因为查找每一行然后记录和删除行会产生开销.仅考虑所有脏页、日志记录和缓存未命中就令人难以置信——更不用说对索引的更新了.

Deleting 3.7 million rows is very time consuming, because of the overhead of looking up each row and then logging and deleting the rows. Just thinking about all the dirty pages, logging, and cache misses is mind-boggling -- not to mention updates to the indexes as well.

出于这个原因,这样的事情可以更快:

For that reason, something like this can be much faster:

create temporary table temp_n2p as 
    select n2p.*
    from "target".name2phoneme n2p
    where not exists (select 1
                      from delta.name2phoneme d 
                      where n2p.NAME_ID = d.NAME_ID and
                            n2p.PHONEME_ID = d.PHONEME_ID
                     );

truncate table "target".name2phoneme;

insert into "target".name2phoneme
    select *
    from temp_n2p;

您还应该在截断之前删除索引,然后在之后重新创建它们.

You should also drop the indexes before the truncation and then recreate them afterwards.

这篇关于Postgresql 从存在的地方缓慢删除的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆