如何使用外键依赖关系删除重复行? [英] How to remove duplicate rows with foreign keys dependencies?
问题描述
我确定这是常见的地方,但Google没有帮助。我试图在PostgreSQL 9.1中写一个简单的存储过程,将从父 cpt
表中删除重复的条目。父表 cpt
由子表 lab
引用:
I'm sure this is common place, but Google is not helping. I am trying to write a simple stored procedure in PostgreSQL 9.1 that will remove duplicate entries from a parent cpt
table. The parent table cpt
is referenced by a child table lab
defined as:
CREATE TABLE lab (
recid serial NOT NULL,
cpt_recid integer,
........
CONSTRAINT cs_cpt FOREIGN KEY (cpt_recid)
REFERENCES cpt (recid) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE RESTRICT,
...
);
我遇到的最大问题是如何获取失败的记录,在 EXCEPTION
子句中将子行从 lab
移动到一个可接受的键,然后循环并删除不必要的表中的 cpt 记录。
The biggest problem I'm having is how to obtain the record which failed so that I can use it in the EXCEPTION
clause to move the children rows from lab
to one acceptable key, then loop back through and delete the unnecessary records from the cpt
table.
这是(错误的)代码:
CREATE OR REPLACE FUNCTION h_RemoveDuplicateCPT()
RETURNS void AS
$BODY$
BEGIN
LOOP
BEGIN
DELETE FROM cpt
WHERE recid IN (
SELECT recid
FROM (
SELECT recid,
row_number() over (partition BY cdesc ORDER BY recid) AS rnum
FROM cpt) t
WHERE t.rnum > 1)
RETURNING recid;
IF count = 0 THEN
RETURN;
END IF;
EXCEPTION WHEN foreign_key_violation THEN
RAISE NOTICE 'fixing unique_violation';
RAISE NOTICE 'recid is %' , recid;
END;
END LOOP;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
推荐答案
>单个SQL语句与 数据修改CTE 。
不需要函数(但可能,当然),无循环,无异常处理:
You can do this much more efficiently with a single SQL statement with data-modifying CTEs.
No function required (but possible, of course), no looping, no exception handling:
WITH plan AS (
SELECT recid, cdesc, min(recid) OVER (PARTITION BY cdesc) AS master_recid
FROM cpt
)
, upd_lab AS (
UPDATE lab l
SET cpt_recid = p.master_recid -- link to master recid ...
FROM plan p
WHERE l.cpt_recid = p.recid
AND p.recid <> p.master_recid -- ... only if not linked to master
)
DELETE FROM cpt c
USING plan p
WHERE c.recid = p.recid
AND p.recid <> p.master_recid -- ... only if not master
RETURNING c.recid; -- optionaly return all deleted (dupe) IDs
> 很多 更快,更干净。
更重要的是, lab
中的引用将重定向到 cpt
自动,但不是在您的原始代码。
This should be much faster and cleaner. Looping is comparatively expensive, exception handling is comparatively even more expensive.
Much more importantly, references in lab
are redirected to the respective master row in cpt
automatically, which wasn't in your original code, yet. So you can delete all dupes at once.
您可以在plpgsql或SQL函数中打包。
You can wrap this in a plpgsql or SQL function if you like.
-
在第一个CTE
计划
主行每组副本。在您的情况下,最小recid
每cdesc
的行。
In the first CTE
plan
, identify the master-row per group of dupes. In your case the row with the minimumrecid
percdesc
.
在第二个CTE upd_lab
中,将引用dupe的所有行重定向到 cpt
中的主行。
In the second CTE upd_lab
redirect all rows referencing a dupe to the master row in cpt
.
最后,删除重复项,这不会引发异常,因为依赖行几乎同时链接到剩余的主数据行。 p>
Finally, delete dupes, which is not going to raise exceptions because depending rows are being linked to the remaining master-row virtually at the same time.
所有CTE和语句的主查询都在基础表的同一快照上同时操作 并发 。他们在底层表格上看不到彼此的效果:
All CTEs and the main query of a statement operate on the same snapshot of underlying tables, virtually concurrently. They don't see each others' effects on underlying tables:
人们可能希望> docs / current / interactive / sql-createtable.htmlrel =nofollow>每个文档:
引用操作other超过
NO ACTION
支票无法延迟,
即使约束声明为可延期。
Referential actions other than the
NO ACTION
check cannot be deferred, even if the constraint is declared deferrable.
但是,上述语句是单个命令和每个文档:
不可延迟的约束会在
每个命令。
您只需要知道写入相同表格的并发事务,但这是一般考虑因素,而不是此任务特有的。
Bold emphasis mine. You only need to be aware of concurrent transactions writing to the same tables, but that's a general consideration, not specific to this task.
例外情况适用于 UNIQUE
和 PRIMARY KEY
约束,但不涉及这种情况:
An exception applies for UNIQUE
and PRIMARY KEY
constraint, but that does not concern this case:
- Constraint defined DEFERRABLE INITIALLY IMMEDIATE is still DEFERRED?
这篇关于如何使用外键依赖关系删除重复行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!