在 data.table 上使用 geosphere distm 函数计算距离 [英] Using the geosphere distm function on a data.table to calculate distances
问题描述
我创建了一个包含 6 列的 data.table.我的 data.table 有一列比较两个位置:位置 1 和位置 2.我正在尝试使用 distm 函数计算每行位置之间的距离,创建第 7 列.geosphere 包中的 distm 包需要两个不同的向量来计算每个纬度/经度组合.我下面的代码不起作用,所以我想弄清楚如何为函数提供向量.
I've created a data.table in that has 6 columns. My data.table has a columns compairing two locations: Location 1 and Location 2. I'm trying to use the distm function to calculate the distance between the locations on each row, creating a 7th column. The distm package in the geosphere package requires two different vectors for each lat/long combo to be calculated against. My code below does not work, so I'm trying to figure out how to provide vectors to the function.
LOC_1_ID LOC1_LAT_CORD LOC1_LONG_CORD LOC_2_ID LOC2_LAT_CORD LOC2_LONG_CORD
1 35.68440 -80.48090 70624 34.86752 -82.46632
6 35.49770 -80.62870 70624 34.86752 -82.46632
10 35.66042 -80.50053 70624 34.86752 -82.46632
假设 res 保存了 data.table,下面的代码不起作用.
Assuming res holds the data.table the below code does not work.
res[,DISTANCE := distm(c(LOC1_LAT_CORD, LOC1_LONG_CORD),c(LOC2_LAT_CORD, LOC2_LONG_CORD), fun=distHaversine)*0.000621371]
如果我要提取每个向量,则该函数可以正常工作.
If I were to pull out each vector the function works fine.
loc1 <- res[LOC1_ID == 1,.(LOC1_LAT_CORD, LOC1_LONG_CORD)]
loc2 <- res[LOC2_ID==70624,.(LOC2_LAT_CORD, LOC2_LONG_CORD)]
distm(loc1, loc2, fun=distHaversine)
真的,我的问题是当该函数需要向量作为参数时,如何应用函数来选择 data.table 中的列.
Really, my question is how to apply functions to select columns within a data.table when that function requires vectors as parameters.
推荐答案
distm
函数生成一组点的距离矩阵.如果您只是比较每一行上的点并添加一列,您确定这是您想要的功能吗?
The distm
fucntion generates a Distance matrix of a set of points. Are you sure this is the function you want if you're just comparing the points on each row, and adding one column?
听起来您实际上想要 distHaversine
或 distGeo
It sounds like you actually want either distHaversine
or distGeo
library(data.table)
library(geosphere)
dt <- read.table(text = "LOC_1_ID LOC1_LAT_CORD LOC1_LONG_CORD LOC_2_ID LOC2_LAT_CORD LOC2_LONG_CORD
1 35.68440 -80.48090 70624 34.86752 -82.46632
6 35.49770 -80.62870 70624 34.86752 -82.46632
10 35.66042 -80.50053 70624 34.86752 -82.46632", header = T)
setDT(dt)
dt[, distance_hav := distHaversine(matrix(c(LOC1_LONG_CORD, LOC1_LAT_CORD), ncol = 2),
matrix(c(LOC2_LONG_CORD, LOC2_LAT_CORD), ncol = 2))]
# LOC_1_ID LOC1_LAT_CORD LOC1_LONG_CORD LOC_2_ID LOC2_LAT_CORD LOC2_LONG_CORD distance_hav
# 1: 1 35.68440 -80.48090 70624 34.86752 -82.46632 202046.3
# 2: 6 35.49770 -80.62870 70624 34.86752 -82.46632 181310.0
# 3: 10 35.66042 -80.50053 70624 34.86752 -82.46632 199282.1
<小时>
更新:这个答案提供了一个更高效的 distHaversine
版本,用于 >data.table
Update: This answer gives a more efficient version of distHaversine
for use in data.table
这篇关于在 data.table 上使用 geosphere distm 函数计算距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!