删除查询结果集中的重复项集群 [英] Removing clusters of duplicates in a query resultset

查看:103
本文介绍了删除查询结果集中的重复项集群的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下查询,返回以下结果:

I have the following query returning the following results:

db=# SELECT t1.id as id1, t2.id as id2
db-#  FROM table_1 As t1, table_2 As t2
db-#  WHERE ST_DWithin(t2.lonlat, t1.lonlat, t2.range)
db-#  ORDER BY t1.id, t2.id, ST_Distance(t2.lonlat, t1.lonlat);
  id1  | id2  
-------+------
  4499 | 1118
  4500 | 1118
  4501 | 1119
  4502 | 1119
  4503 | 1118
  4504 | 1118
  4505 | 1119
  4506 | 1119
  4507 | 1118
  4508 | 1118
  4510 | 1118
  4511 | 1118
  4514 | 1117
  4515 | 1117
  4518 | 1117
  4519 | 1117
  4522 | 1117
  4523 | 1117
  4603 | 1116
  4604 | 1116
  4607 | 1116

我希望结果集看起来像这样:

And I want the resultset to look like this:

  id1  | id2  
-------+------
  4499 | 1118
  4501 | 1119
  4503 | 1118
  4505 | 1119
  4507 | 1118
  4514 | 1117
  4603 | 1116

本质上,在结果中,查询返回的是 id2 ,但是在结果中多次出现 id2 是可以的,但是如果 id2 在集群中重复。

Essentially, in the results, the query is returning duplicates of id2, but it's ok that id2 occurs many times in the results, but it's not ok if id2 is duplicated in clusters.

这里的用例是 id1 代表GPS位置表的ID,而 id2 代表路点表,我想查询一个返回最接近的通行点到任何路点的查询(因此,如果路点# 1118已通过,那么直到通过另一个航点之前,它才能再次通过。)

The use case here is that id1 represents the ID of a table of GPS positions, while id2 represents a table of waypoints, and I want to have a query that returns the closest passing point to any waypoint (so if waypoint #1118 is passed, then it cannot be passed again until another waypoint is passed).

是否有一种方法可以使用Postgres做到这一点?

Is there a way to make this happen using Postgres?

推荐答案

这是一个空白问题,但很细微。在这种情况下,您只需要上一行具有不同的 id2 的行。这表明使用 LAG()

This is a gaps-and-islands problem, but rather subtle. In this case, you only want the rows where the previous row has a different id2. That suggests using LAG():

SELECT id1, id2
FROM (SELECT tt.*, LAG(id2) OVER (ORDER BY id1, id2, dist) as prev_id2
      FROM (SELECT t1.id as id1, t2.id as id2,
                   ST_Distance(t2.lonlat, t1.lonlat) as dist
            FROM table_1 t1 JOIN
                 table_2 t2
                 ON ST_DWithin(t2.lonlat, t1.lonlat, t2.range)
           ) tt
     ) tt
WHERE prev_id2 is distinct from id2
ORDER BY id1, id2, dist;

注意:我认为所提供的逻辑可以简化,因为 id1 似乎很独特。因此,距离计算似乎完全是多余的。我保留了该逻辑,因为它可能与您的实际查询有关。

Note: I think the logic as presented could be simplified because id1 seems unique. Hence the distance calculation seems entirely superfluous. I left that logic in because it might be relevant in your actual query.

这篇关于删除查询结果集中的重复项集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆