如何扩展此SQL查询以找到k个最近的邻居? [英] How can I extend this SQL query to find the k nearest neighbors?

查看:73
本文介绍了如何扩展此SQL查询以找到k个最近的邻居?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个充满二维数据的数据库-地图上的点.每个记录都有一个几何类型的字段.我需要做的是将一个点传递给存储过程,该存储过程返回 k 最近的点(k也将传递给sproc,但这很容易).我在 http://blogs中找到了一个查询. msdn.com/isaac/archive/2008/10/23/nearest-neighbors.aspx 可以获取最近的单个邻居,但是我不知道如何扩展它来找到 k 最近的邻居.

这是当前查询-T是表,g是几何字段,@x是要搜索的点,Numbers是具有1到 n的整数的表:

DECLARE @start FLOAT = 1000; 
WITH NearestPoints AS
(
     SELECT TOP(1) WITH TIES *,  T.g.STDistance(@x) AS dist
     FROM Numbers JOIN T WITH(INDEX(spatial_index)) 
     ON T.g.STDistance(@x) < @start*POWER(2,Numbers.n)
     ORDER BY n
)
SELECT TOP(1) * FROM NearestPoints
ORDER BY n, dist

内部查询选择最近的非空区域,然后外部查询从该区域选择顶部结果;外部查询可以很容易地更改为(例如)SELECT TOP(20),但是如果最近的区域仅包含一个结果,那么您将受此困扰.

我认为我可能需要递归搜索包含 k 记录的第一个区域,但不使用表变量(这将导致维护问题,因为您必须创建表结构,并且很容易变化-有很多字段),我看不到.

解决方案

如果从内部查询中删除TOP (1) WITH TIES并将外部查询设置为返回前 k 行,会发生什么? /p>

我也想知道此修正案是否有帮助.它应该比使用TOP更有效:

DECLARE @start FLOAT = 1000
        ,@k INT = 20
        ,@p FLOAT = 2;

WITH NearestPoints AS
(
     SELECT *
            ,T.g.STDistance(@x) AS dist
            ,ROW_NUMBER() OVER (ORDER BY T.g.STDistance(@x)) AS rn
     FROM Numbers 
     JOIN T WITH(INDEX(spatial_index)) 
     ON   T.g.STDistance(@x) <  @start*POWER(@p,Numbers.n)
     AND (Numbers.n - 1 = 0 
          OR T.g.STDistance(@x) >= @start*POWER(@p,Numbers.n - 1)
         )
)
SELECT * 
FROM NearestPoints
WHERE rn <= @k;

NB-未经测试-我无法在此处访问SQL 2008.

I have a database full of two-dimensional data - points on a map. Each record has a field of the geometry type. What I need to be able to do is pass a point to a stored procedure which returns the k nearest points (k would also be passed to the sproc, but that's easy). I've found a query at http://blogs.msdn.com/isaac/archive/2008/10/23/nearest-neighbors.aspx which gets the single nearest neighbour, but I can't figure how to extend it to find the k nearest neighbours.

This is the current query - T is the table, g is the geometry field, @x is the point to search around, Numbers is a table with integers 1 to n:

DECLARE @start FLOAT = 1000; 
WITH NearestPoints AS
(
     SELECT TOP(1) WITH TIES *,  T.g.STDistance(@x) AS dist
     FROM Numbers JOIN T WITH(INDEX(spatial_index)) 
     ON T.g.STDistance(@x) < @start*POWER(2,Numbers.n)
     ORDER BY n
)
SELECT TOP(1) * FROM NearestPoints
ORDER BY n, dist

The inner query selects the nearest non-empty region and the outer query then selects the top result from that region; the outer query can easily be changed to (e.g.) SELECT TOP(20), but if the nearest region only contains one result, you're stuck with that.

I figure I probably need to recursively search for the first region containing k records, but without using a table variable (which would cause maintenance problems as you have to create the table structure and it's liable to change - there're lots of fields), I can't see how.

解决方案

What happens if you remove TOP (1) WITH TIES from the inner query, and set the outer query to return the top k rows?

I'd also be interested to know whether this amendment helps at all. It ought to be more efficient than using TOP:

DECLARE @start FLOAT = 1000
        ,@k INT = 20
        ,@p FLOAT = 2;

WITH NearestPoints AS
(
     SELECT *
            ,T.g.STDistance(@x) AS dist
            ,ROW_NUMBER() OVER (ORDER BY T.g.STDistance(@x)) AS rn
     FROM Numbers 
     JOIN T WITH(INDEX(spatial_index)) 
     ON   T.g.STDistance(@x) <  @start*POWER(@p,Numbers.n)
     AND (Numbers.n - 1 = 0 
          OR T.g.STDistance(@x) >= @start*POWER(@p,Numbers.n - 1)
         )
)
SELECT * 
FROM NearestPoints
WHERE rn <= @k;

NB - untested - I don't have access to SQL 2008 here.

这篇关于如何扩展此SQL查询以找到k个最近的邻居?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆