从表中删除重复项,然后将引用行重新链接到新的主表 [英] Remove duplicates from a table and re-link referencing rows to the new master

查看:84
本文介绍了从表中删除重复项,然后将引用行重新链接到新的主表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个表转录,其中包含转录文本的段落及其带有列的引用:

I have a table transcription which contains passages of transcribed text and their citations with columns:

文本,transcription_id(PK),t_notes,引文

和第二个表 town_transcription 是将文本中引用的位置(来自另一个表)链接到该转录记录的关系表。该表具有以下列:

and the second table town_transcription being the relationship table that links places (from another table) referenced in the text to that transcription record. This table has the columns:

town_id(FK),transcription_id(FK),confidence_interval

这些文字中有许多引用了多个城镇,但愚蠢的我只是复制了记录并将它们分别链接到每个城镇。我已使用以下SQL查询确定了重复的文本行:

Many of these passages of text reference multiple towns, but stupidly I just duplicated records and linked them individually to each town. I have identified the duplicate rows of text using the following SQL query:

SELECT * FROM transcription aa
WHERE (select count(*) from transcription bb
WHERE (bb.text = aa.text) AND (bb.citation = aa.citation)) > 1
ORDER BY text ASC;

我现在有大约2000行(某些文本的2到6个重复)需要删除 transcription 表中多余的 transcription_id 并更改 transcription_id 从关系表 town_transcription 中指向剩余的,现在唯一的转录记录。通过阅读其他问题,我认为可能有必要使用 UPDATE FROM INNER JOIN ,但是我真的不知道如何实现这一目标,我只是一个初学者,感谢您的帮助。

I now have about 2000 rows (2 to 6 duplicates of some text passages) where I need to delete the extra transcription_id's from the transcription table and change the transcription_id from the relationship table, town_transcription, to point to the remaining, now unique, transcription record. From reading other questions, I think utilizing UPDATE FROM and INNER JOIN might be necessary, but I really don't know how to implement this, I'm just a beginner, thanks for any help.

推荐答案

这条命令应该可以完成所有操作:

This single command should do it all:

WITH blacklist AS (  -- identify duplicate IDs and their master
   SELECT *
   FROM  (
      SELECT transcription_id
           , min(transcription_id) OVER (PARTITION BY text, citation) AS master_id
      FROM   transcription
      ) sub
   WHERE  transcription_id <> master_id
   )
, upd AS (  -- redirect referencing rows
   UPDATE town_transcription tt
   SET    transcription_id = b.master_id
   FROM   blacklist b
   WHERE  b.transcription_id = tt.transcription_id
   )
DELETE FROM transcription t  -- kill dupes (now without reference)
USING  blacklist b
WHERE  b.transcription_id = t.transcription_id;

由于缺乏定义,我选择了每组ID最小的行作为尚存的主行。

For lack of definition I chose the row with the smallest ID per group as surviving master row.

除非您具有非默认设置,否则FK约束不会妨碍您。详细说明:

FK constraints don't get in the way unless you have non-default settings. Detailed explanation:

  • How to remove duplicate rows with foreign keys dependencies?
  • Delete duplicates and reroute referencing rows to new master

现在,您可能想要添加 UNIQUE 约束以防止再次发生相同的错误:

After removing the dupes you might now want to add a UNIQUE constraint to prevent the same error from reoccurring:

ALTER TABLE transcription
ADD CONSTRAINT transcription_uni UNIQUE (text, citation);

这篇关于从表中删除重复项,然后将引用行重新链接到新的主表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆