另一个为什么最近的邻居空间查询这么慢? [英] Another Why Is This Nearest Neighbor Spatial Query So Slow?

查看:109
本文介绍了另一个为什么最近的邻居空间查询这么慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下

Following this recommendation for an optimized nearest neighbor update, I'm using the below tsql to update a GPS table of 11,000 points with the nearest point of interest to each point.

WHILE (2 > 1) 
  BEGIN 
    BEGIN TRANSACTION 
    UPDATE TOP ( 100 ) s 
set 
[NEAR_SHELTER]= fname,
[DIST_SHELTER] = Shape.STDistance(fshape)
from(
Select
[dbo].[GRSM_GPS_COLLAR].*,
fnc.NAME as fname,
fnc.Shape as fShape
from
[dbo].[GRSM_GPS_COLLAR]
CROSS APPLY (SELECT TOP 1 NAME, shape                   
FROM [dbo].[BACK_COUNTRY_SHELTERS] WITH(index ([S50_idx]))                
WHERE [BACK_COUNTRY_SHELTERS].Shape.STDistance([dbo].[GRSM_GPS_COLLAR].Shape) IS NOT NULL
                  ORDER BY BACK_COUNTRY_SHELTERS.Shape.STDistance([dbo].[GRSM_GPS_COLLAR].Shape) ASC) fnc)s; 

    IF @@ROWCOUNT = 0 
      BEGIN 
        COMMIT TRANSACTION 
         BREAK 
      END 
    COMMIT TRANSACTION 
    -- 1 second delay
    WAITFOR DELAY '00:00:01'
  END -- WHILE
GO

请注意,为了避免锁定,我以100块为单位进行操作,如果不对块进行锁定,则会得到锁定,并且它必须运行数小时才能杀死它.显而易见的答案是您是否优化了空间索引",答案是肯定的,两个表都有一个空间索引(SQL 2012),Geography Autogrid,每个对象4092个单元格,这被认为是经过数天的工作效率最高的索引测试索引参数的所有可能排列.我已经尝试过在有和没有空间索引提示的情况下进行....具有多个空间索引.

Note that I'm doing it in chunks of 100 to avoid locking, which I get if I don't chunk it up, and it runs for hours before I have to kill it. The obvious answer is "Have you optimized your spatial indexes" and the answer is yes, both tables have a spatial index (SQL 2012), Geography Autogrid, 4092 cells per object, which was found to be the most efficient index after many days of testing every possible permutation of index parameters. I have tried this with and without the spatial index hint....with multiple spatial indexes.

在上面,请注意空间索引查找成本以及有关没有列统计信息的警告,我理解空间索引就是这种情况.在每种情况下,我最终都必须终止tsql.它会永远运行(在一种情况下是一夜之间,更新了2300行).

In the above, note the spatial index seek cost and the warning about no column statistics, which I understand is the case with spatial indexes. In each case I eventually have to terminate the tsql. It just runs forever (in one case overnight, with 2300 rows updated).

我已经尝试过

I've tried Isaac's numbers table join solution, but that example doesn't appear to lend itself to looping through n distance searches, just a single user-supplied location (@x).

更新

@ Brad D根据您的答案,我尝试了此操作,但遇到了一些语法错误,我无法弄清楚...我不确定我是否可以将您的示例正确转换为我的示例.有什么想法我做错了吗?谢谢!

@ Brad D based your answer, I tried this, with some syntax errors that I can't quite figure out...I'm not sure I'm converting your example to mine correctly. Any ideas what I'm doing wrong? Thanks!

;WITH Points as(
SELECT TOP 100 [NAME], [Shape] as GeoPoint
FROM [BACK_COUNTRY_SHELTERS]
WHERE 1=1 


SELECT P1.*, CP.[GPS_POS_NUMBER] as DestinationName, CP.Dist
INTO #tmp_Distance
FROM [GRSM_GPS_COLLAR] P1
CROSS APPLY (SELECT [NAME] ,    Shape.STDistance(P1.GeoPoint)/1609.344 as     Dist
FROM [BACK_COUNTRY_SHELTERS] as P2
WHERE 1=1 
AND P1.[NAME] <> P2.[NAME] --Don't compare it to itself

) as CP

CREATE CLUSTERED INDEX tmpIX ON #tmp_Distance (name, Dist)


SELECT * FROM
(SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Dist ASC) as Rnk FROM #tmp_Distance) as tbl1
WHERE rnk = 1
DROP TABLE #tmp_Distance

推荐答案

您实际上是在比较1.21亿个数据点(从11K始发地到11K目的地),试图一次完成所有操作并不能很好地扩展.我喜欢您将其分为几批的想法,但是尝试对没有索引的1.1MM记录的结果集进行排序可能会很痛苦.

You're essentially comparing 121 million data points (11K Origins to 11K destinations) this isn't going to scale well trying to do it all in one fell swoop. I like your idea of breaking it into batches, but trying to do an ordering of a result set of 1.1MM records without an index could be painful.

我建议将其分解为更多的操作.我只是尝试了以下方法,所以在我的环境中,每批次可以在不到一分钟的时间内运行它. (5500个位置记录)

I suggest breaking this out into a few more operations. I just tried the following and it runs in under a minute per batch in my environment. (5500 location records)

这对我来说是有效的,没有地理空间索引,但是围绕原点和到目的地的距离没有明显的索引.

This was able to work for me, without a geospatial index, but a clusted index around the origin and the distance to the destination.

;WITH Points as(
SELECT TOP 100 Name, AddressLine1,
    AddressLatitude, AddressLongitude
    , geography::STGeomFromText('POINT(' + CONVERT(varchar(50),AddressLatitude) + ' ' + CONVERT(varchar(50),AddressLongitude)     + ')',4326) as GeoPoint
FROM ServiceFacility
WHERE 1=1 
AND AddressLatitude BETWEEN -90 AND 90
AND AddressLongitude BETWEEN -90 AND 90)

SELECT P1.*, CP.Name as DestinationName, CP.Dist
INTO #tmp_Distance
FROM Points P1
CROSS APPLY (SELECT Name, AlternateName,
    geography::STGeomFromText('POINT(' + CONVERT(varchar(50),P2.AddressLatitude) + ' ' + CONVERT(varchar(50),P2.AddressLongitude) + ')',4326).STDistance(P1.GeoPoint)/1609.344 as     Dist
FROM ServiceFacility as P2
WHERE 1=1 
AND P1.Name <> P2.Name --Don't compare it to itself
AND P2.AddressLatitude BETWEEN -90 AND 90
AND P2.AddressLongitude BETWEEN -90 AND 90
) as CP

CREATE CLUSTERED INDEX tmpIX ON #tmp_Distance (name, Dist)


SELECT * FROM
(SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Dist ASC) as Rnk FROM #tmp_Distance) as tbl1
WHERE rnk = 1
DROP TABLE #tmp_Distance

对100条记录甚至11000条记录的实际更新不应花费太长时间.空间索引很酷,但是如果我错过了某些东西,那么对于此特定练习,我认为并没有硬性要求.

The actual update on 100 records, or even 11000 records shouldn't take too long. Spatial index's are cool, but incase I'm missing something I don't see a hard stop requirement for this for this particular exercise.

这篇关于另一个为什么最近的邻居空间查询这么慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆