从表中删除没有唯一键的重复行 [英] Delete duplicate rows from table with no unique key
问题描述
如何删除Postgres 9表中的重复行,这些行在每个字段上都是完全重复的,没有任何可以用作唯一键的单个字段,所以我不能只是 GROUP BY
列,并使用 NOT IN
语句。
我正在寻找一个单一的SQL语句,而不是一个解决方案,需要我创建临时表并插入记录。我知道如何做,但需要更多的工作来适应我的自动化过程。
表定义:
jthinksearch => \d releases_labels;
无记录的表discogs.releases_labels
列|类型|修饰符
------------ + --------- + -----------
label |文字|
release_id |整数|
catno |文字|
索引:
releases_labels_catno_idxbtree(catno)
releases_labels_name_idxbtree(label)
外键约束:
foreign_didFOREIGN KEY(release_id)参考release(id)
样本数据:
jthinksearch => select * from releases_labels where release_id = 6155;
label | release_id | catno
-------------- + ------------ + ------------
经线记录| 6155 | WAP 39 CDR
经线记录| 6155 | WAP 39 CDR
如果你有能力重写整个表,这可能是最简单的方法:
WITH已删除的AS(
DELETE FROM discogs.releases_labels
返回*
)
INSERT INTO discogs.releases_labels
SELECT DISTINCT * FROM Deleted
如果您需要专门定位重复记录,则可以使用内部 ctid
字段,该字段唯一标识一行:
$ disc $ $ b GROUP BY label,release_id,catno
)
非常小心 CTID
;它随着时间的推移而改变。但是,您可以依靠在单一声明范围内保持不变。
How do I delete duplicates rows in Postgres 9 table, the rows are completely duplicates on every field AND there is no individual field that could be used as a unique key so I cant just GROUP BY
columns and use a NOT IN
statement.
I'm looking for a single SQL statement, not a solution that requires me to create temporary table and insert records into that. I know how to do that but requires more work to fit into my automated process.
Table definition:
jthinksearch=> \d releases_labels;
Unlogged table "discogs.releases_labels"
Column | Type | Modifiers
------------+---------+-----------
label | text |
release_id | integer |
catno | text |
Indexes:
"releases_labels_catno_idx" btree (catno)
"releases_labels_name_idx" btree (label)
Foreign-key constraints:
"foreign_did" FOREIGN KEY (release_id) REFERENCES release(id)
Sample data:
jthinksearch=> select * from releases_labels where release_id=6155;
label | release_id | catno
--------------+------------+------------
Warp Records | 6155 | WAP 39 CDR
Warp Records | 6155 | WAP 39 CDR
If you can afford to rewrite the whole table, this is probably the simplest approach:
WITH Deleted AS (
DELETE FROM discogs.releases_labels
RETURNING *
)
INSERT INTO discogs.releases_labels
SELECT DISTINCT * FROM Deleted
If you need to specifically target the duplicated records, you can make use of the internal ctid
field, which uniquely identifies a row:
DELETE FROM discogs.releases_labels
WHERE ctid NOT IN (
SELECT MIN(ctid)
FROM discogs.releases_labels
GROUP BY label, release_id, catno
)
Be very careful with ctid
; it changes over time. But you can rely on it staying the same within the scope of a single statement.
这篇关于从表中删除没有唯一键的重复行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!