从表中删除重复项，然后将引用行重新链接到新的主表 [英] Remove duplicates from a table and re-link referencing rows to the new master

查看：84 发布时间：2020/5/30 1:14:56 sql postgresql duplicates common-table-expression

本文介绍了从表中删除重复项，然后将引用行重新链接到新的主表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个表转录，其中包含转录文本的段落及其带有列的引用：

I have a table transcription which contains passages of transcribed text and their citations with columns:

文本，transcription_id（PK），t_notes，引文

和第二个表 town_transcription 是将文本中引用的位置（来自另一个表）链接到该转录记录的关系表。该表具有以下列：

and the second table town_transcription being the relationship table that links places (from another table) referenced in the text to that transcription record. This table has the columns:

town_id（FK），transcription_id（FK），confidence_interval

这些文字中有许多引用了多个城镇，但愚蠢的我只是复制了记录并将它们分别链接到每个城镇。我已使用以下SQL查询确定了重复的文本行：

Many of these passages of text reference multiple towns, but stupidly I just duplicated records and linked them individually to each town. I have identified the duplicate rows of text using the following SQL query:

SELECT * FROM transcription aa
WHERE (select count(*) from transcription bb
WHERE (bb.text = aa.text) AND (bb.citation = aa.citation)) > 1
ORDER BY text ASC;

我现在有大约2000行（某些文本的2到6个重复）需要删除 transcription 表中多余的 transcription_id 并更改 transcription_id 从关系表 town_transcription 中指向剩余的，现在唯一的转录记录。通过阅读其他问题，我认为可能有必要使用 UPDATE FROM 和 INNER JOIN ，但是我真的不知道如何实现这一目标，我只是一个初学者，感谢您的帮助。

I now have about 2000 rows (2 to 6 duplicates of some text passages) where I need to delete the extra transcription_id's from the transcription table and change the transcription_id from the relationship table, town_transcription, to point to the remaining, now unique, transcription record. From reading other questions, I think utilizing UPDATE FROM and INNER JOIN might be necessary, but I really don't know how to implement this, I'm just a beginner, thanks for any help.

推荐答案

这条命令应该可以完成所有操作：

This single command should do it all:

WITH blacklist AS (  -- identify duplicate IDs and their master
   SELECT *
   FROM  (
      SELECT transcription_id
           , min(transcription_id) OVER (PARTITION BY text, citation) AS master_id
      FROM   transcription
      ) sub
   WHERE  transcription_id <> master_id
   )
, upd AS (  -- redirect referencing rows
   UPDATE town_transcription tt
   SET    transcription_id = b.master_id
   FROM   blacklist b
   WHERE  b.transcription_id = tt.transcription_id
   )
DELETE FROM transcription t  -- kill dupes (now without reference)
USING  blacklist b
WHERE  b.transcription_id = t.transcription_id;

由于缺乏定义，我选择了每组ID最小的行作为尚存的主行。

For lack of definition I chose the row with the smallest ID per group as surviving master row.

除非您具有非默认设置，否则FK约束不会妨碍您。详细说明：

FK constraints don't get in the way unless you have non-default settings. Detailed explanation:

如何删除具有外键依赖性的重复行？

删除重复项并将引用行重新路由到新的主文件中

How to remove duplicate rows with foreign keys dependencies?
Delete duplicates and reroute referencing rows to new master

现在，您可能想要添加 UNIQUE 约束以防止再次发生相同的错误：

After removing the dupes you might now want to add a UNIQUE constraint to prevent the same error from reoccurring:

ALTER TABLE transcription
ADD CONSTRAINT transcription_uni UNIQUE (text, citation);

这篇关于从表中删除重复项，然后将引用行重新链接到新的主表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从表中删除重复项，然后将引用行重新链接到新的主表 [英] Remove duplicates from a table and re-link referencing rows to the new master

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从表中删除重复项，然后将引用行重新链接到新的主表 [英] Remove duplicates from a table and re-link referencing rows to the new master

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭