计算不同数据帧中的点之间的距离 [英] Calculating the distance between points in different data frames
问题描述
我试图找到两个不同数据帧中的点之间的距离,因为它们的列之一具有相同的值。
是加入或关联两个数据帧中的数据。例如,有数据帧A和B,它们都具有lat / long信息,并且共享列 Name
。请注意,对于给定的名称,每个数据帧中的lat / long信息是不同的。这就是为什么我想计算它们之间的距离。
我设想最终的功能就像如果 A $ Name = B $ Name
然后使用他们对应的lat /长的数据来计算它们之间的距离。
任何想法?
示例数据:
A< - data.frame(Lat = 1:4,Long = 1:4,Name = c(a,b ,c,d))
B< - data.frame(Lat = 5:8,Long = 5:8,Name = c(a,b d))
现在我想关联 A
和 B
,以便我可以问最终的问题,如果 A $ Name == B $ Name
它们之间的距离使用其相应的lat长数据。我也应该注意,我不能做一个简单的欧几里德距离,因为这些点在水中发生,而它们之间的路径需要在水中(或由某些区域限定)。任何帮助,将不胜感激。
要计算纬度/长点之间的距离,可以使用 distm
函数从 geosphere
包中。在此功能中,您可以使用几个公式计算距离: distCosine
, distHaversine
, distVincentySphere
和 distVincentyEllipsoid
。最后一个被认为是最准确的(根据包作者)。
库(geosphere)
A< - data.frame(Lat = 1:4,Long = 1:4,Name = c(a,b,c,d))
B< ; - data.frame(Lat = 5:8,Long = 5:8,Name = c(a,b,c,d))
A $ distance < - distVincentyEllipsoid(A [,c('Long','Lat')],B [,c('Long','Lat')])
这给出:
> A
Lat长名称距离
1 1 1 a 627129.5
2 2 2 b 626801.7
3 3 3 c 626380.6
4 4 4 d 625866.6
请注意,您必须按照第一个经度,然后是纬度的顺序包括lat / long列。
尽管这个简单的例子完美可行,但是在名称不一致的较大数据集中,这将导致问题。在这种情况下,您可以使用 data.table
并设置密钥,以便您可以匹配点数并计算距离(正如@MichaelChirico在他的答案中所做的那样):
library(data.table)
A< - data.table(Lat = 1:4,Long = 1: Name = c(a,b,c,d),key =Name)
B< - data.table(Lat = 8:5,Long = 8:5 ,Name = c(d,c,b,a),key =Name)
A [B,distance:= distVincentyEllipsoid(A [,。 Long,Lat)],B [,。(Long,Lat)]]]
看,这给出了与以前方法相同的正确(即相同的)结果:
> A
Lat长名称距离
1:1 1 a 627129.5
2:2 2 b 626801.7
3:3 3 c 626380.6
4:4 4 d 625866.6
要查看什么 key = Name
可以比较以下两个数据:
B1< - data.table Lat = 8:5,Long = 8:5,Name = c(d,c,b,a),key =Name)
B2< - data.table (Lat = 8:5,Long = 8:5,Name = c(d,c,b,a))
另请参见此回答< a>更详细的例子。
I am trying to find the distance between points in two different data frames given that they have the same value in one of their columns.
I figure the first step is to join or relate the data in the two data frames. For example there is dataframe A and B which both have lat/long information in them and they share the column Name
. Note that for a given Name the lat/long information is different in each dataframe. Thats why I want to calculate the distance between them.
I envision the final function being something like if A$Name=B$Name
then use their corresponding lat/long data to calculate the distance between them.
Any thoughts?
Example data:
A <- data.frame(Lat=1:4,Long=1:4,Name=c("a","b","c","d"))
B <- data.frame(Lat=5:8,Long=5:8,Name=c("a","b","c","d"))
Now I want to relate A
and B
so that I can ask the ultimate question if A$Name==B$Name
what is the distance between them using their corresponding lat long data.
I should also note that I will not be able to do a straightforward euclidean distance because the points occur in water and the path distance between them needs to be in the water (or bounded by some area). Any help with that would be appreciated as well.
For calculating the distance between lat/long points, you can use the distm
function from the geosphere
package. Within this function you can use several formula's for calculating the distance: distCosine
, distHaversine
, distVincentySphere
and distVincentyEllipsoid
. The last one is considered the most accurate one (according to the package author).
library(geosphere)
A <- data.frame(Lat=1:4, Long=1:4, Name=c("a","b","c","d"))
B <- data.frame(Lat=5:8, Long=5:8, Name=c("a","b","c","d"))
A$distance <- distVincentyEllipsoid(A[,c('Long','Lat')], B[,c('Long','Lat')])
this gives:
> A
Lat Long Name distance
1 1 1 a 627129.5
2 2 2 b 626801.7
3 3 3 c 626380.6
4 4 4 d 625866.6
Note that you have to include the lat/long columns in the order of first longitude and then latitude.
Although this works perfectly on this simple example, in larger datasets where the names are not in the same order, this will lead to problems. In that case you can use data.table
and set the keys so you can match the points and calculate the distance (as @MichaelChirico did in his answer):
library(data.table)
A <- data.table(Lat=1:4, Long=1:4, Name=c("a","b","c","d"), key="Name")
B <- data.table(Lat=8:5, Long=8:5, Name=c("d","c","b","a"), key="Name")
A[B,distance:=distVincentyEllipsoid(A[,.(Long,Lat)], B[,.(Long,Lat)])]
as you can see, this gives the correct (i.e., the same) result as in the previous method:
> A
Lat Long Name distance
1: 1 1 a 627129.5
2: 2 2 b 626801.7
3: 3 3 c 626380.6
4: 4 4 d 625866.6
To see what key="Name"
does, compare the following two datatables:
B1 <- data.table(Lat=8:5, Long=8:5, Name=c("d","c","b","a"), key="Name")
B2 <- data.table(Lat=8:5, Long=8:5, Name=c("d","c","b","a"))
See also this answer for a more elaborate example.
这篇关于计算不同数据帧中的点之间的距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!