Geosphere距离矩阵:避免重复演算 [英] Matrix of distances with Geosphere: avoid repeat calculus

查看:56
本文介绍了Geosphere距离矩阵:避免重复演算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用来自 geosphere distm 来计算非常大的矩阵中所有点之间的距离.

I want to compute the distance among all points in a very large matrix using distm from geosphere.

请参阅一个最小示例:

library(geosphere)
library(data.table)

coords <- data.table(coordX=c(1,2,5,9), coordY=c(2,2,0,1))
distances <- distm(coords, coords, fun = distGeo)

问题在于,由于我要计算的距离的性质, distm 给了我一个对称矩阵,因此,我可以避免计算一半以上的距离:

The issue is that due to the nature of the distances I am computing, distm gives me back a symmetric matrix, therefore, I could avoid to calculate more than half of the distances:

structure(c(0, 111252.129800202, 497091.059564718, 897081.91986428, 
111252.129800202, 0, 400487.621661164, 786770.053508848, 497091.059564718, 
400487.621661164, 0, 458780.072878927, 897081.91986428, 786770.053508848, 
458780.072878927, 0), .Dim = c(4L, 4L))

您能帮我找到一种更有效的方法来计算所有这些距离,而不必每次都做两次吗?

May you help me to find a more efficient way to compute all those distances avoiding doing twice each one?

推荐答案

如果要计算点 x 的所有成对距离,最好使用 distm(x)而不是 distm(x,x). distm 函数在两种情况下都返回相同的对称矩阵,但是当您将其传递给单个参数时,它知道矩阵是对称的,因此不会进行不必要的计算.

If you want to compute all pairwise distances for points x, it is better to use distm(x) rather than distm(x,x). The distm function returns the same symmetric matrix in both cases but when you pass it a single argument it knows that the matrix is symmetric, so it won't do unnecessary computations.

您可以计时.

library("geosphere")

n <- 500
xy <- matrix(runif(n*2, -90, 90), n, 2)

system.time( replicate(100, distm(xy, xy) ) )
#  user  system elapsed 
# 61.44    0.23   62.79 
system.time( replicate(100, distm(xy) ) )
#  user  system elapsed 
# 36.27    0.39   38.05 

您还可以查看 geosphere :: distm 的R代码,以检查对两种情况的区别对待.

You can also look at the R code for geosphere::distm to check that it treats the two cases differently.

在旁边:谷歌快速搜索找到 parallelDist :在CRAN上的并行距离矩阵计算.测地距离是一个选择.

Aside: Quick google search finds parallelDist: Parallel Distance Matrix Computation on CRAN. The geodesic distance is an option.

这篇关于Geosphere距离矩阵:避免重复演算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆