仅在SQL的列中选择重复值的第一行 [英] Only select first row of repeating value in a column in SQL

查看:255
本文介绍了仅在SQL的列中选择重复值的第一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个表,该表的列可能在突发中具有相同的值.像这样:

I have table that has a column that may have same values in a burst. Like this:

+----+---------+
| id |   Col1  | 
+----+---------+
| 1  | 6050000 |
+----+---------+
| 2  | 6050000 |
+----+---------+
| 3  | 6050000 |
+----+---------+
| 4  | 6060000 |
+----+---------+
| 5  | 6060000 |
+----+---------+
| 6  | 6060000 |
+----+---------+
| 7  | 6060000 |
+----+---------+
| 8  | 6060000 |
+----+---------+
| 9  | 6050000 |
+----+---------+
| 10 | 6000000 |
+----+---------+
| 11 | 6000000 |
+----+---------+

现在我要修剪重复Col1的行,并仅选择第一个出现的行.
对于上表,结果应为:

Now I want to prune rows where the value of Col1 is repeated and only select the first occurrence.
For the above table the result should be:

+----+---------+
| id |   Col1  | 
+----+---------+
| 1  | 6050000 |
+----+---------+
| 4  | 6060000 |
+----+---------+
| 9  | 6050000 |
+----+---------+
| 10 | 6000000 |
+----+---------+

如何在SQL中执行此操作?
请注意,仅应删除突发行,并且可以在非突发行中重复值! id=1&在样本结果中重复id=9.

How can I do this in SQL?
Note that only burst rows should be removed and values can be repeated in non-burst rows! id=1 & id=9 are repeated in sample result.


我是通过以下方式实现的:


I achieved it using this:

select id,col1 from data as d1
where not exists (
    Select id from data as d2
    where d2.id=d1.id-1 and d1.col1=d2.col1 order by id limit 1)

但是,这仅在ID是连续的时才有效. id(已删除的ID)之间有间隔,查询中断.我该如何解决?

But this only works when ids are sequential. With gaps between ids (deleted ones) the query breaks. How can I fix this?

推荐答案

您可以使用EXISTS半联接来识别候选者:

You can use a EXISTS semi-join to identify candidates:

SELECT * FROM tbl
WHERE NOT EXISTS (
    SELECT *
    FROM tbl t
    WHERE t.col1 = tbl.col1
    AND t.id = tbl.id - 1
    )
ORDER BY id

摆脱不必要的行:

DELETE FROM tbl
-- SELECT * FROM tbl
WHERE EXISTS (
    SELECT *
    FROM   tbl t
    WHERE  t.col1 = tbl.col1
    AND    t.id   = tbl.id - 1
    )

这将有效地删除每一行,而前一行在col1中具有相同的值,从而达到您设定的目标:每个突发的第一行都将保留.

This effectively deletes every row, where the preceding row has the same value in col1, thereby arriving at your set goal: only the first row of every burst survives.

我留下了注释后的SELECT声明,因为您应该始终在执行操作之前先检查要删除的内容.

I left the commented SELECT statement because you should always check what is going to be deleted before you do the deed.

如果您的RDBMS支持 CTE

If your RDBMS supports the CTE and window functions (like PostgreSQL, Oracle, SQL Server, ... but not SQLite, MS Access or MySQL), there is an elegant way:

WITH x AS (
    SELECT *, row_number() OVER (ORDER BY id) AS rn
    FROM tbl
    )
SELECT id, col1
FROM   x
WHERE NOT EXISTS (
    SELECT *
    FROM   x x1
    WHERE  x1.col1 = x.col1
    AND    x1.rn   = x.rn - 1
    )
ORDER BY id;

还有一种不太优雅的方式来完成这项工作没有这些细微之处.
应该为您工作:

There is also the not-so-elegant way that does the job without those niceties.
Should work for you:

SELECT id, col1
FROM   tbl
WHERE (
    SELECT t.col1 = tbl.col1
    FROM   tbl AS t
    WHERE  t.id < tbl.id
    ORDER  BY id DESC
    LIMIT  1) IS NOT TRUE
ORDER BY id


测试外壳非顺序ID的工具

(在PostgreSQL中测试)


Tool for test-casing non-sequential IDs

(Tested in PostgreSQL)

CREATE TEMP TABLE tbl (id int, col1 int);
INSERT INTO tbl VALUES
 (1,6050000),(2,6050000),(6,6050000)
,(14,6060000),(15,6060000),(16,6060000)
,(17,6060000),(18,6060000),(19,6050000)
,(20,6000000),(111,6000000);

这篇关于仅在SQL的列中选择重复值的第一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆