如何使用data.table有效地计算坐标对之间的距离:= [英] How to efficiently calculate distance between pair of coordinates using data.table :=
问题描述
我想找到最有效(最快)的方法来计算两对lat长坐标之间的距离。
I want to find the most efficient (fastest) method to calculate the distances between pairs of lat long coordinates.
一个不那么高效的解决方案已经提出(此处)使用 sapply
和 spDistsN1 {sp}
。我相信这可以做得更快,如果 spDistsN1 {sp}
里面 data.table
c $ c>:= 运算符,但是我还是不能这样做。有任何建议吗?
A not so efficient solution has been presented (here) using sapply
and spDistsN1{sp}
. I believe this could be made much faster if one would use spDistsN1{sp}
inside data.table
with the :=
operator but I haven't been able to do that. Any suggestions?
以下是可重复范例:
# load libraries
library(data.table)
library(dplyr)
library(sp)
library(rgeos)
library(UScensus2000tract)
# load data and create an Origin-Destination matrix
data("oregon.tract")
# get centroids as a data.frame
centroids <- as.data.frame(gCentroid(oregon.tract,byid=TRUE))
# Convert row names into first column
setDT(centroids, keep.rownames = TRUE)[]
# create Origin-destination matrix
orig <- centroids[1:754, ]
dest <- centroids[2:755, ]
odmatrix <- bind_cols(orig,dest)
colnames(odmatrix) <- c("origi_id", "long_orig", "lat_orig", "dest_id", "long_dest", "lat_dest")
使用 data.table
My failed attempt using data.table
odmatrix[ , dist_km := spDistsN1(as.matrix(long_orig, lat_orig), as.matrix(long_dest, lat_dest), longlat=T)]
这是一个可行的解决方案
Here is a solution that works (but probably less efficiently)
odmatrix$dist_km <- sapply(1:nrow(odmatrix),function(i)
spDistsN1(as.matrix(odmatrix[i,2:3]),as.matrix(odmatrix[i,5:6]),longlat=T))
head(odmatrix)
> origi_id long_orig lat_orig dest_id long_dest lat_dest dist_km
> (chr) (dbl) (dbl) (chr) (dbl) (dbl) (dbl)
> 1 oregon_0 -123.51 45.982 oregon_1 -123.67 46.113 19.0909
> 2 oregon_1 -123.67 46.113 oregon_2 -123.95 46.179 22.1689
> 3 oregon_2 -123.95 46.179 oregon_3 -123.79 46.187 11.9014
> 4 oregon_3 -123.79 46.187 oregon_4 -123.83 46.181 3.2123
> 5 oregon_4 -123.83 46.181 oregon_5 -123.85 46.182 1.4054
> 6 oregon_5 -123.85 46.182 oregon_6 -123.18 46.066 53.0709
推荐答案
I写了我自己的版本 geosphere :: distHaversine
,以便它更自然地适合 data.table
:=
调用,它可能在这里使用
I wrote my own version of geosphere::distHaversine
so that it would more naturally fit into a data.table
:=
call, and it might be of use here
dt.haversine <- function(lat_from, lon_from, lat_to, lon_to, r = 6378137){
radians <- pi/180
lat_to <- lat_to * radians
lat_from <- lat_from * radians
lon_to <- lon_to * radians
lon_from <- lon_from * radians
dLat <- (lat_to - lat_from)
dLon <- (lon_to - lon_from)
a <- (sin(dLat/2)^2) + (cos(lat_from) * cos(lat_to)) * (sin(dLon/2)^2)
return(2 * atan2(sqrt(a), sqrt(1 - a)) * r)
}
如何对原始 geosphere :: distHaversine
和 geosphere :: distGeo
dt1 <- copy(odmatrix); dt2 <- copy(odmatrix); dt3 <- copy(odmatrix)
library(microbenchmark)
microbenchmark(
dtHaversine = {
dt1[, dist := dt.haversine(lat_orig, long_orig, lat_dest, long_dest)]
} ,
haversine = {
dt2[ , dist := distHaversine(matrix(c(long_orig, lat_orig), ncol = 2),
matrix(c(long_dest, lat_dest), ncol = 2))]
},
geo = {
dt3[ , dist := distGeo(matrix(c(long_orig, lat_orig), ncol = 2),
matrix(c(long_dest, lat_dest), ncol = 2))]
}
)
# Unit: microseconds
# expr min lq mean median uq max neval
# dtHaversine 370.300 396.6210 434.5841 411.4305 463.9965 906.797 100
# haversine 651.974 681.1745 776.6127 706.2760 731.3480 1505.765 100
# geo 647.699 679.8285 743.4914 706.0465 742.1605 1272.310 100
当然,在两种不同的技术(geo& haversine),结果将略有不同。
Naturally, due to the way the distances are calculated in the two different techniques (geo & haversine), the results will differ slightly.
这篇关于如何使用data.table有效地计算坐标对之间的距离:=的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!