如何删除具有外键依赖项的重复行? [英] How to remove duplicate rows with foreign keys dependencies?

查看:29
本文介绍了如何删除具有外键依赖项的重复行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我确定这很常见,但 Google 没有提供帮助.我正在尝试在 PostgreSQL 9.1 中编写一个简单的存储过程,它将从父 cpt 表中删除重复的条目.父表 cpt 被子表 lab 引用,定义为:

I'm sure this is common place, but Google is not helping. I am trying to write a simple stored procedure in PostgreSQL 9.1 that will remove duplicate entries from a parent cpt table. The parent table cpt is referenced by a child table lab defined as:

CREATE TABLE lab (
 recid serial NOT NULL,
 cpt_recid integer,
  ........
 CONSTRAINT cs_cpt FOREIGN KEY (cpt_recid)
   REFERENCES cpt (recid) MATCH SIMPLE
   ON UPDATE NO ACTION ON DELETE RESTRICT,
 ...
);

我遇到的最大问题是如何获取失败的记录,以便我可以在 EXCEPTION 子句中使用它来将子行从 lab 移动到一个可接受的键,然后循环返回并从 cpt 表中删除不必要的记录.

The biggest problem I'm having is how to obtain the record which failed so that I can use it in the EXCEPTION clause to move the children rows from lab to one acceptable key, then loop back through and delete the unnecessary records from the cpt table.

这是(非常错误的)代码:

Here is the (very wrong) code:

CREATE OR REPLACE FUNCTION h_RemoveDuplicateCPT()
  RETURNS void AS
$BODY$
BEGIN
LOOP
   BEGIN

   DELETE FROM cpt
   WHERE recid IN (
      SELECT recid
      FROM  (
         SELECT recid,
         row_number() over (partition BY cdesc ORDER BY recid) AS rnum
         FROM cpt) t
      WHERE t.rnum > 1)
   RETURNING recid;

   IF count = 0 THEN
      RETURN;
   END IF;  

   EXCEPTION WHEN foreign_key_violation THEN
      RAISE NOTICE 'fixing unique_violation';
      RAISE NOTICE 'recid is %' , recid;
   END;
END LOOP;
END;                    
$BODY$
LANGUAGE plpgsql VOLATILE;

推荐答案

您可以通过 数据修改 CTE.

You can do this much more efficiently with a single SQL statement with data-modifying CTEs.

WITH plan AS (
   SELECT *
   FROM  (
      SELECT recid, min(recid) OVER (PARTITION BY cdesc) AS master_recid
      FROM   cpt
      ) sub
   WHERE  recid <> master_recid  -- ... <> self
   )
 , upd_lab AS (
   UPDATE lab l
   SET    cpt_recid = p.master_recid   -- link to master recid ...
   FROM   plan p
   WHERE  l.cpt_recid = p.recid
   )
DELETE FROM cpt c
USING  plan p
WHERE  c.recid = p.recid
RETURNING c.recid;

db<>小提琴这里(第 11 页)
SQL Fiddle(第 9.6 页)

db<>fiddle here (pg 11)
SQL Fiddle (pg 9.6)

这应该更快更干净.循环比较昂贵,异常处理比较昂贵.
更重要的是,lab 中的引用会自动重定向到 cpt 中的相应主行,这在您的原始代码中还没有.因此,您可以一次性删除所有的复制品.

This should be much faster and cleaner. Looping is comparatively expensive, exception handling is comparatively even more expensive.
More importantly, references in lab are redirected to the respective master row in cpt automatically, which wasn't in your original code, yet. So you can delete all dupes at once.

如果您愿意,您仍然可以将其包装在 plpgsql 或 SQL 函数中.

You can still wrap this in a plpgsql or SQL function if you like.

  1. 在第一个 CTE plan 中,用相同的 cdesc 标识每个分区中的主行.在您的情况下,具有最小 recid 的行.

  1. In the 1st CTE plan, identify a master row in each partition with the same cdesc. In your case the row with the minimum recid.

在第二个 CTE upd_lab 中,将所有引用重复的行重定向到 cpt 中的主行.

In the 2nd CTE upd_lab redirect all rows referencing a dupe to the master row in cpt.

最后,删除重复项,这不会引发异常,因为相关行实际上是同时链接到剩余的主行.

Finally, delete dupes, which is not going to raise exceptions because depending rows are being linked to the remaining master row virtually at the same time.

ON DELETE RESTRICT

语句的所有 CTE 和主查询在基本表的同一快照上运行,几乎同时.他们看不到彼此对基础表的影响:

ON DELETE RESTRICT

All CTEs and the main query of a statement operate on the same snapshot of underlying tables, virtually concurrently. They don't see each others' effects on underlying tables:

人们可能期望带有 ON DELETE RESTRICT 的 FK 约束会引发异常,因为,[根据文档][3]:

One might expect a FK constraint with ON DELETE RESTRICT to raise exceptions because, [per documentation][3]:

除了 NO ACTION 检查以外的参考动作不能被推迟,即使约束被声明为可延迟的.

Referential actions other than the NO ACTION check cannot be deferred, even if the constraint is declared deferrable.

但是,上面的语句是一个单个命令,并且,[再次使用手册][3]:

However, the above statement is a single command and, [the manual again][3]:

一个不可延迟的约束将在之后立即检查每个命令.

A constraint that is not deferrable will be checked immediately after every command.

我的大胆强调.当然,也适用于限制较少的默认 ON DELETE NO ACTION.

Bold emphasis mine. Works for the less restrictive default ON DELETE NO ACTION too, of course.

但要小心并发事务写入相同的表,但这是一般考虑,并非特定于此任务.

But be wary of concurrent transactions writing to the same tables, but that's a general consideration, not specific to this task.

一个例外适用于 UNIQUEPRIMARY KEY 约束,但这与 这种 情况无关:

An exception applies for UNIQUE and PRIMARY KEY constraint, but that does not concern this case:

这篇关于如何删除具有外键依赖项的重复行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆