删除基于两列的重复项,并保留具有另一列最小值的行 [英] Delete duplicates based on two columns and keep the row that has minimum value of another column

查看:175
本文介绍了删除基于两列的重复项,并保留具有另一列最小值的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将discog的xml文件转换为MYSQL表,现在我面临着同一首歌的许多重复条目,而且发行年份不同,这是由于发行了诸如"best of"之类的歌曲而引起的.

I converted discog's xml file to a MYSQL table, now I am faced with many duplicate entries of the same songs with different release years, this happens because of releases like "best of", etc.

我需要一个SQL查询,该查询将基于"artist"和"track"两列删除行,但保留最早的"year".因此表如下所示:

I need a SQL query that will delete rows based on two columns 'artist' and 'track' but keep the earliest 'year'.. so the table looks like such:

id   | artist              | track                    | year
-----------------------------------------------------------------
1      Some Artist           Greatest Song Ever         1999
2      Some Artist           Greatest Song Ever         1985
3      Some Artist           Greatest Song Ever         2000

基本上,我想删除除带有'year'1985的一行以外的所有内容.

Basically I want to delete all except the one row with 'year' 1985.

据我了解

ALTER IGNORE TABLE discog ADD UNIQUE (artist, track);

这曾经删除了所有的,但是一个,但是我不相信IGNORE可以和新版本的mysql一起工作.而且我不知道如何保留MIN(year)

this used to remove all but one, however I do not believe IGNORE works with new versions of mysql. And I do not know how to keep the MIN(year)

推荐答案

您可以使用此查询删除所有重复的条目,并保留最早的条目:

You can use this query to delete all duplicate entries, leaving the earliest one:

DELETE d
FROM discog d
JOIN discog d1 ON d1.artist = d.artist AND d1.track = d.track AND d1.year < d.year;

更新

对于大型表来说,另一种更有效的替代解决方案是使用行上的UNIQUE索引来创建副本,以防止重复插入:

An alternate solution which should be more efficient for really large tables is to create a copy, using a UNIQUE index on the rows to prevent duplicate insertion:

CREATE TABLE discog_copy (id INT, artist VARCHAR(50), track VARCHAR(50), year INT);
ALTER TABLE discog_copy ADD UNIQUE KEY (artist, track);
INSERT IGNORE INTO discog_copy SELECT * FROM discog ORDER BY year;

唯一键位于艺术家名称和曲目名称的组合上,因此它将允许艺术家使用不同的曲目,而不同的艺术家使用相同的曲目名称.由于查询的SELECT部分具有ORDER BY年,因此它将首先插入最低年份的(artist,track,year)组合,然后由于重复而不会插入其他相同的(artist,track)记录键.

The unique key is on the combination of artist name and track name and so it will allow artists to have different tracks and different artists to have the same track name. Because the SELECT part of the query has ORDER BY year, it will insert the (artist,track,year) combination with the lowest year first and then other identical (artist, track) records will not be inserted due to the duplicate key.

右旋演示[a>

Demo on rextester

这篇关于删除基于两列的重复项,并保留具有另一列最小值的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆