路径的最接近点 [英] Closest point to a path
问题描述
我有两组点,分别称为path
和centers
.对于path
中的每个点,我想要一种有效的方法来查找centers
中最接近的点的ID.我想在R中执行此操作.以下是一个简单的可重现示例.
I have two sets of points, called path
and centers
. For each point in path
, I would like an efficient method for finding the ID of the closest point in centers
. I would like to do this in R. Below is a simple reproducible example.
set.seed(1)
n <- 10000
x <- 100*cumprod(1 + rnorm(n, 0.0001, 0.002))
y <- 50*cumprod(1 + rnorm(n, 0.0001, 0.002))
path <- data.frame(cbind(x=x, y=y))
centers <- expand.grid(x=seq(0, 500,by=0.5) + rnorm(1001),
y=seq(0, 500, by=0.2) + rnorm(2501))
centers$id <- seq(nrow(centers))
x
和y
是坐标.我想在path
data.frame中添加一列,该列具有给定x和y坐标的最接近中心的ID.然后,我想获取所有唯一的ID.
x
and y
are coordinates. I would like to add a column to the path
data.frame that has the id of the closest center for the given x and y co-ordinate. I then want to get all of the unique ids.
目前,我的解决方案确实有效,但是当问题的规模增加时,它的速度将非常慢.我想要更高效的东西.
My solution at the moment does work, but is very slow when the scale of the problem increases. I would like something much more efficient.
path$closest.id <- sapply(seq(nrow(path)), function(z){
tmp <- ((centers$x - path[z, 'x'])^2) + ((centers$y - path[z, 'y'])^2)
as.numeric(centers[tmp == min(tmp), 'id'])
})
output <- unique(path$closest.id)
在帮助您加快速度方面的任何帮助将不胜感激.
Any help on speeding this up would be greatly appreciated.
我认为data.table
可能会有所帮助,但理想情况下,我正在寻找的是一种可能在搜索方面更智能的算法,即,不计算到每个中心的距离,而是仅选择最小的一个. ..获取ID ...
I think data.table
might help, but ideally what I am looking for is an algorithm that is perhaps a bit smarter in terms of the search, i.e. instead of calculating the distances to each center and then only selecting the minimum one... to get the id...
如果能够帮助提高性能,我也很高兴使用Rcpp
/Rcpp11
.
I am also happy to use Rcpp
/Rcpp11
as well if that would help improve performance.
我可以进行这种计算的最短可接受时间为10秒,但显然更快会更好.
My minimum acceptable time to perform this kind of calculation out would be 10 seconds, but obviously faster would be better.
推荐答案
您可以使用RANN
包中的nn2
进行此操作.在我的系统上,这会在2秒内计算出与每个path
点最接近的center
.
You can do this with nn2
from the RANN
package. On my system, this computes the nearest center
to each of your path
points in under 2 seconds.
library(RANN)
system.time(closest <- nn2(centers[, 1:2], path, 1))
# user system elapsed
# 1.41 0.14 1.55
sapply(closest, head)
# nn.idx nn.dists
# [1,] 247451 0.20334929
# [2,] 250454 0.12326323
# [3,] 250454 0.28540127
# [4,] 253457 0.05178687
# [5,] 253457 0.13324137
# [6,] 253457 0.09009626
这是另一个示例,其中有250万个候选点都落在path
点的范围内(在您的示例中,centers
的x
和y
范围比path
大得多)点).在这种情况下要慢一些.
Here's another example with 2.5 million candidate points that all fall within the extent of the path
points (in your example, the centers
have a much larger x
and y
range than do the path
points). It's a little slower in this case.
set.seed(1)
centers2 <- cbind(runif(2.5e6, min(x), max(x)), runif(2.5e6, min(y), max(y)))
system.time(closest2 <- nn2(centers2, path, 1))
# user system elapsed
# 2.96 0.11 3.07
sapply(closest2, head)
# nn.idx nn.dists
# [1,] 730127 0.025803703
# [2,] 375514 0.025999069
# [3,] 2443707 0.047259283
# [4,] 62780 0.022747930
# [5,] 1431847 0.002482623
# [6,] 2199405 0.028815865
可以将其与使用sp::spDistsN1
的输出进行比较(此问题的速度要慢得多):
This can be compared to the output using sp::spDistsN1
(which is much slower for this problem):
library(sp)
apply(head(path), 1, function(x) which.min(spDistsN1(centers, x)))
# 1 2 3 4 5 6
# 730127 375514 2443707 62780 1431847 2199405
将点ID添加到path
data.frame并减少为唯一值是很简单的:
Adding the point id to the path
data.frame and reducing to unique values is trivial:
path$closest.id <- closest$nn.idx
output <- unique(path$closest.id)
这篇关于路径的最接近点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!