路径的最接近点 [英] Closest point to a path

查看:128
本文介绍了路径的最接近点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两组点,分别称为pathcenters.对于path中的每个点,我想要一种有效的方法来查找centers中最接近的点的ID.我想在R中执行此操作.以下是一个简单的可重现示例.

I have two sets of points, called path and centers. For each point in path, I would like an efficient method for finding the ID of the closest point in centers. I would like to do this in R. Below is a simple reproducible example.

set.seed(1)
n <- 10000
x <- 100*cumprod(1 + rnorm(n, 0.0001, 0.002))
y <- 50*cumprod(1 + rnorm(n, 0.0001, 0.002))

path <- data.frame(cbind(x=x, y=y))

centers <- expand.grid(x=seq(0, 500,by=0.5) + rnorm(1001), 
                       y=seq(0, 500, by=0.2) + rnorm(2501))

centers$id <- seq(nrow(centers))

xy是坐标.我想在path data.frame中添加一列,该列具有给定x和y坐标的最接近中心的ID.然后,我想获取所有唯一的ID.

x and y are coordinates. I would like to add a column to the path data.frame that has the id of the closest center for the given x and y co-ordinate. I then want to get all of the unique ids.

目前,我的解决方案确实有效,但是当问题的规模增加时,它的速度将非常慢.我想要更高效的东西.

My solution at the moment does work, but is very slow when the scale of the problem increases. I would like something much more efficient.

path$closest.id <- sapply(seq(nrow(path)), function(z){
   tmp <- ((centers$x - path[z, 'x'])^2) + ((centers$y - path[z, 'y'])^2)
   as.numeric(centers[tmp == min(tmp), 'id'])
})

output <- unique(path$closest.id)

在帮助您加快速度方面的任何帮助将不胜感激.

Any help on speeding this up would be greatly appreciated.

我认为data.table可能会有所帮助,但理想情况下,我正在寻找的是一种可能在搜索方面更智能的算法,即,不计算到每个中心的距离,而是仅选择最小的一个. ..获取ID ...

I think data.table might help, but ideally what I am looking for is an algorithm that is perhaps a bit smarter in terms of the search, i.e. instead of calculating the distances to each center and then only selecting the minimum one... to get the id...

如果能够帮助提高性能,我也很高兴使用Rcpp/Rcpp11.

I am also happy to use Rcpp/Rcpp11 as well if that would help improve performance.

我可以进行这种计算的最短可接受时间为10秒,但显然更快会更好.

My minimum acceptable time to perform this kind of calculation out would be 10 seconds, but obviously faster would be better.

推荐答案

您可以使用RANN包中的nn2进行此操作.在我的系统上,这会在2秒内计算出与每个path点最接近的center.

You can do this with nn2 from the RANN package. On my system, this computes the nearest center to each of your path points in under 2 seconds.

library(RANN)
system.time(closest <- nn2(centers[, 1:2], path, 1))

#   user  system elapsed 
#   1.41    0.14    1.55 



sapply(closest, head)

#      nn.idx   nn.dists
# [1,] 247451 0.20334929
# [2,] 250454 0.12326323
# [3,] 250454 0.28540127
# [4,] 253457 0.05178687
# [5,] 253457 0.13324137
# [6,] 253457 0.09009626

这是另一个示例,其中有250万个候选点都落在path点的范围内(在您的示例中,centersxy范围比path大得多)点).在这种情况下要慢一些.

Here's another example with 2.5 million candidate points that all fall within the extent of the path points (in your example, the centers have a much larger x and y range than do the path points). It's a little slower in this case.

set.seed(1)
centers2 <- cbind(runif(2.5e6, min(x), max(x)), runif(2.5e6, min(y), max(y)))
system.time(closest2 <- nn2(centers2, path, 1))

#   user  system elapsed 
#   2.96    0.11    3.07 

sapply(closest2, head)

#       nn.idx    nn.dists
# [1,]  730127 0.025803703
# [2,]  375514 0.025999069
# [3,] 2443707 0.047259283
# [4,]   62780 0.022747930
# [5,] 1431847 0.002482623
# [6,] 2199405 0.028815865

可以将其与使用sp::spDistsN1的输出进行比较(此问题的速度要慢得多):

This can be compared to the output using sp::spDistsN1 (which is much slower for this problem):

library(sp)
apply(head(path), 1, function(x) which.min(spDistsN1(centers, x)))

#       1       2       3       4       5       6 
#  730127  375514 2443707   62780 1431847 2199405 

将点ID添加到path data.frame并减少为唯一值是很简单的:

Adding the point id to the path data.frame and reducing to unique values is trivial:

path$closest.id <- closest$nn.idx
output <- unique(path$closest.id)

这篇关于路径的最接近点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆