删除查询结果集中的重复项集群 [英] Removing clusters of duplicates in a query resultset
问题描述
我有以下查询,返回以下结果:
I have the following query returning the following results:
db=# SELECT t1.id as id1, t2.id as id2
db-# FROM table_1 As t1, table_2 As t2
db-# WHERE ST_DWithin(t2.lonlat, t1.lonlat, t2.range)
db-# ORDER BY t1.id, t2.id, ST_Distance(t2.lonlat, t1.lonlat);
id1 | id2
-------+------
4499 | 1118
4500 | 1118
4501 | 1119
4502 | 1119
4503 | 1118
4504 | 1118
4505 | 1119
4506 | 1119
4507 | 1118
4508 | 1118
4510 | 1118
4511 | 1118
4514 | 1117
4515 | 1117
4518 | 1117
4519 | 1117
4522 | 1117
4523 | 1117
4603 | 1116
4604 | 1116
4607 | 1116
我希望结果集看起来像这样:
And I want the resultset to look like this:
id1 | id2
-------+------
4499 | 1118
4501 | 1119
4503 | 1118
4505 | 1119
4507 | 1118
4514 | 1117
4603 | 1116
本质上,在结果中,查询返回的是 id2
,但是在结果中多次出现 id2
是可以的,但是如果 id2
在集群中重复。
Essentially, in the results, the query is returning duplicates of id2
, but it's ok that id2
occurs many times in the results, but it's not ok if id2
is duplicated in clusters.
这里的用例是 id1
代表GPS位置表的ID,而 id2
代表路点表,我想查询一个返回最接近的通行点到任何路点的查询(因此,如果路点# 1118已通过,那么直到通过另一个航点之前,它才能再次通过。)
The use case here is that id1
represents the ID of a table of GPS positions, while id2
represents a table of waypoints, and I want to have a query that returns the closest passing point to any waypoint (so if waypoint #1118 is passed, then it cannot be passed again until another waypoint is passed).
是否有一种方法可以使用Postgres做到这一点?
Is there a way to make this happen using Postgres?
推荐答案
这是一个空白问题,但很细微。在这种情况下,您只需要上一行具有不同的 id2
的行。这表明使用 LAG()
:
This is a gaps-and-islands problem, but rather subtle. In this case, you only want the rows where the previous row has a different id2
. That suggests using LAG()
:
SELECT id1, id2
FROM (SELECT tt.*, LAG(id2) OVER (ORDER BY id1, id2, dist) as prev_id2
FROM (SELECT t1.id as id1, t2.id as id2,
ST_Distance(t2.lonlat, t1.lonlat) as dist
FROM table_1 t1 JOIN
table_2 t2
ON ST_DWithin(t2.lonlat, t1.lonlat, t2.range)
) tt
) tt
WHERE prev_id2 is distinct from id2
ORDER BY id1, id2, dist;
注意:我认为所提供的逻辑可以简化,因为 id1
似乎很独特。因此,距离计算似乎完全是多余的。我保留了该逻辑,因为它可能与您的实际查询有关。
Note: I think the logic as presented could be simplified because id1
seems unique. Hence the distance calculation seems entirely superfluous. I left that logic in because it might be relevant in your actual query.
这篇关于删除查询结果集中的重复项集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!