PostGIS中的K-最近邻查询 [英] K-Nearest Neighbor Query in PostGIS
问题描述
我在PostGIS中使用以下最近邻查询:
I am using the following Nearest Neighbor Query in PostGIS :
SELECT g1.gid g2.gid FROM points as g1, polygons g2
WHERE g1.gid <> g2.gid
ORDER BY g1.gid, ST_Distance(g1.the_geom,g2.the_geom)
LIMIT k;
现在,我已经在the_geom和两个表上的gid列上创建了索引,这个查询比其他涉及空间连接的空间查询花费更多时间b / w两个表。
Now, that I have created indexes on the_geom as well as gid column on both the tables, this query is taking much more time than other spatial queries involving spatial joins b/w two tables.
有没有更好的方法来找到K-最近邻居?我正在使用PostGIS。
Is there any better way to find K-nearest neighbors? I am using PostGIS.
而且,尽管在几何列上创建了索引,但另一个查询花费了非常长的时间:
And, another query which is taking a unusually long time despite creating indexes on geometry column is:
select g1.gid , g2.gid from polygons as g1 , polygons as g2
where st_area(g1.the_geom) > st_area(g2.the_geom) ;
我相信,这些查询并没有受到主要指数的影响,但为什么呢?
I believe, these queries arent benefited by gist indexes, but why?
鉴于此查询:
select a.polyid , sum(length(b.the_geom)) from polygon as a , roads as b
where st_intersects(a.the_geom , b.the_geom);
在一段时间后返回结果,尽管涉及的道路表比多边形或点数表大得多还涉及更复杂的空间算子。
returns result after some time despite involving "roads" table which is much bigger than polygons or points table and also involve more complex spatial operators.
推荐答案
关于你的问题的一些想法:
Just a few thoughts on your problem:
st_distance以及st_area无法使用索引。这是因为这两个功能都不能简化为是否在b内?之类的问题。或做a和b重叠?。更具体:GIST-indices只能在两个对象的边界框上运行。
st_distance as well as st_area are not able to use indices. This is because both functions can not be reduced to questions like "Is a within b?" or "Do a and b overlap?". Even more concrete: GIST-indices can only operate on the bounding boxes of two objects.
有关这方面的更多信息,您可以查看 postgis手册,其中说明了st_distance的示例以及如何改进查询以更好地执行。
For more information on this you just could look in the postgis manual, which states an example with st_distance and how the query could be improved to perform better.
但是,这并不能解决你的k-最近邻居问题。为此,我现在还不知道如何提高查询的性能。我看到的唯一机会是假设k个最近的邻居总是在x米以下的距离。然后你可以使用postgis手册中的类似方法。
However, this does not solve your k-nearest-neighbour-problem. For that, right now I do not have a good idea how to improve the performance of the query. The only chance I see would be assuming that the k nearest neighbors are always in a distance of below x meters. Then you could use a similar approach as done in the postgis manual.
您的第二个查询可能会加速一点。目前,您计算表1中每个对象的区域,因为表中有行 - 策略是首先加入数据然后根据该函数进行选择。您可以减少区域计算的数量,显着地预先计算区域:
Your second query could be speeded up a bit. Currently, you compute the area for each object in table 1 as often as table has rows - the strategy is first to join the data and then select based on that function. You could reduce the count of area computations significantly be precomputing the area:
WITH polygonareas AS (
SELECT gid, the_geom, st_area(the_geom) AS area
FROM polygons
)
SELECT g1.gid, g2.gid
FROM polygonareas as g1 , polygonareas as g2
WHERE g1.area > g2.area;
使用边界框可以显着优化第三个查询:当两个对象的边界框不重叠时,对象无法做到。这允许使用给定的索引,从而获得巨大的性能提升。
Your third query can be significantly optimized using bounding boxes: When the bounding boxes of two objects do not overlap, there is no way the objects do. This allows the usage of a given index and thus a huge performance gain.
这篇关于PostGIS中的K-最近邻查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!