仅在SQL的列中选择重复值的第一行 [英] Only select first row of repeating value in a column in SQL
问题描述
我有一个表,该表的列可能在突发中具有相同的值.像这样:
I have table that has a column that may have same values in a burst. Like this:
+----+---------+
| id | Col1 |
+----+---------+
| 1 | 6050000 |
+----+---------+
| 2 | 6050000 |
+----+---------+
| 3 | 6050000 |
+----+---------+
| 4 | 6060000 |
+----+---------+
| 5 | 6060000 |
+----+---------+
| 6 | 6060000 |
+----+---------+
| 7 | 6060000 |
+----+---------+
| 8 | 6060000 |
+----+---------+
| 9 | 6050000 |
+----+---------+
| 10 | 6000000 |
+----+---------+
| 11 | 6000000 |
+----+---------+
现在我要修剪重复Col1
的行,并仅选择第一个出现的行.
对于上表,结果应为:
Now I want to prune rows where the value of Col1
is repeated and only select the first occurrence.
For the above table the result should be:
+----+---------+
| id | Col1 |
+----+---------+
| 1 | 6050000 |
+----+---------+
| 4 | 6060000 |
+----+---------+
| 9 | 6050000 |
+----+---------+
| 10 | 6000000 |
+----+---------+
如何在SQL中执行此操作?
请注意,仅应删除突发行,并且可以在非突发行中重复值! id=1
&在样本结果中重复id=9
.
How can I do this in SQL?
Note that only burst rows should be removed and values can be repeated in non-burst rows! id=1
& id=9
are repeated in sample result.
我是通过以下方式实现的:
I achieved it using this:
select id,col1 from data as d1
where not exists (
Select id from data as d2
where d2.id=d1.id-1 and d1.col1=d2.col1 order by id limit 1)
但是,这仅在ID是连续的时才有效. id(已删除的ID)之间有间隔,查询中断.我该如何解决?
But this only works when ids are sequential. With gaps between ids (deleted ones) the query breaks. How can I fix this?
推荐答案
您可以使用EXISTS
半联接来识别候选者:
You can use a EXISTS
semi-join to identify candidates:
SELECT * FROM tbl
WHERE NOT EXISTS (
SELECT *
FROM tbl t
WHERE t.col1 = tbl.col1
AND t.id = tbl.id - 1
)
ORDER BY id
摆脱不必要的行:
DELETE FROM tbl
-- SELECT * FROM tbl
WHERE EXISTS (
SELECT *
FROM tbl t
WHERE t.col1 = tbl.col1
AND t.id = tbl.id - 1
)
这将有效地删除每一行,而前一行在col1
中具有相同的值,从而达到您设定的目标:每个突发的第一行都将保留.
This effectively deletes every row, where the preceding row has the same value in col1
, thereby arriving at your set goal: only the first row of every burst survives.
我留下了注释后的SELECT
声明,因为您应该始终在执行操作之前先检查要删除的内容.
I left the commented SELECT
statement because you should always check what is going to be deleted before you do the deed.
如果您的RDBMS支持 CTE 和
If your RDBMS supports the CTE and window functions (like PostgreSQL, Oracle, SQL Server, ... but not SQLite, MS Access or MySQL), there is an elegant way:
WITH x AS (
SELECT *, row_number() OVER (ORDER BY id) AS rn
FROM tbl
)
SELECT id, col1
FROM x
WHERE NOT EXISTS (
SELECT *
FROM x x1
WHERE x1.col1 = x.col1
AND x1.rn = x.rn - 1
)
ORDER BY id;
还有一种不太优雅的方式来完成这项工作没有这些细微之处.
应该为您工作:
There is also the not-so-elegant way that does the job without those niceties.
Should work for you:
SELECT id, col1
FROM tbl
WHERE (
SELECT t.col1 = tbl.col1
FROM tbl AS t
WHERE t.id < tbl.id
ORDER BY id DESC
LIMIT 1) IS NOT TRUE
ORDER BY id
测试外壳非顺序ID的工具
(在PostgreSQL中测试)
Tool for test-casing non-sequential IDs
(Tested in PostgreSQL)
CREATE TEMP TABLE tbl (id int, col1 int);
INSERT INTO tbl VALUES
(1,6050000),(2,6050000),(6,6050000)
,(14,6060000),(15,6060000),(16,6060000)
,(17,6060000),(18,6060000),(19,6050000)
,(20,6000000),(111,6000000);
这篇关于仅在SQL的列中选择重复值的第一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!