计算不同数据帧中的点之间的距离 [英] Calculating the distance between points in different data frames

查看:260
本文介绍了计算不同数据帧中的点之间的距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找到两个不同数据帧中的点之间的距离,因为它们的列之一具有相同的值。



是加入或关联两个数据帧中的数据。例如,有数据帧A和B,它们都具有lat / long信息,并且共享列 Name 。请注意,对于给定的名称,每个数据帧中的lat / long信息是不同的。这就是为什么我想计算它们之间的距离。



我设想最终的功能就像如果 A $ Name = B $ Name 然后使用他们对应的lat /长的数据来计算它们之间的距离。



任何想法?



示例数据:

  A<  -  data.frame(Lat = 1:4,Long = 1:4,Name = c(a,b ,c,d))
B< - data.frame(Lat = 5:8,Long = 5:8,Name = c(a,b d))

现在我想关联 A B ,以便我可以问最终的问题,如果 A $ Name == B $ Name 它们之间的距离使用其相应的lat长数据。我也应该注意,我不能做一个简单的欧几里德距离,因为这些点在水中发生,而它们之间的路径需要在水中(或由某些区域限定)。任何帮助,将不胜感激。

解决方案

要计算纬度/长点之间的距离,可以使用 distm 函数从 geosphere 包中。在此功能中,您可以使用几个公式计算距离: distCosine distHaversine distVincentySphere distVincentyEllipsoid 。最后一个被认为是最准确的(根据包作者)。

 库(geosphere)

A< - data.frame(Lat = 1:4,Long = 1:4,Name = c(a,b,c,d))
B< ; - data.frame(Lat = 5:8,Long = 5:8,Name = c(a,b,c,d))

A $ distance < - distVincentyEllipsoid(A [,c('Long','Lat')],B [,c('Long','Lat')])

这给出:

 > A 
Lat长名称距离
1 1 1 a 627129.5
2 2 2 b 626801.7
3 3 3 c 626380.6
4 4 4 d 625866.6

请注意,您必须按照第一个经度,然后是纬度的顺序包括lat / long列。






尽管这个简单的例子完美可行,但是在名称不一致的较大数据集中,这将导致问题。在这种情况下,您可以使用 data.table 并设置密钥,以便您可以匹配点数并计算距离(正如@MichaelChirico在他的答案中所做的那样):

  library(data.table)
A< - data.table(Lat = 1:4,Long = 1: Name = c(a,b,c,d),key =Name)
B< - data.table(Lat = 8:5,Long = 8:5 ,Name = c(d,c,b,a),key =Name)

A [B,distance:= distVincentyEllipsoid(A [,。 Long,Lat)],B [,。(Long,Lat)]]]

看,这给出了与以前方法相同的正确(即相同的)结果:

 > A 
Lat长名称距离
1:1 1 a 627129.5
2:2 2 b 626801.7
3:3 3 c 626380.6
4:4 4 d 625866.6






要查看什么 key = Name可以比较以下两个数据:

  B1<  -  data.table Lat = 8:5,Long = 8:5,Name = c(d,c,b,a),key =Name)
B2< - data.table (Lat = 8:5,Long = 8:5,Name = c(d,c,b,a))






另请参见此回答< a>更详细的例子。


I am trying to find the distance between points in two different data frames given that they have the same value in one of their columns.

I figure the first step is to join or relate the data in the two data frames. For example there is dataframe A and B which both have lat/long information in them and they share the column Name. Note that for a given Name the lat/long information is different in each dataframe. Thats why I want to calculate the distance between them.

I envision the final function being something like if A$Name=B$Name then use their corresponding lat/long data to calculate the distance between them.

Any thoughts?

Example data:

A <- data.frame(Lat=1:4,Long=1:4,Name=c("a","b","c","d"))
B <- data.frame(Lat=5:8,Long=5:8,Name=c("a","b","c","d"))

Now I want to relate A and B so that I can ask the ultimate question if A$Name==B$Name what is the distance between them using their corresponding lat long data.

I should also note that I will not be able to do a straightforward euclidean distance because the points occur in water and the path distance between them needs to be in the water (or bounded by some area). Any help with that would be appreciated as well.

解决方案

For calculating the distance between lat/long points, you can use the distm function from the geosphere package. Within this function you can use several formula's for calculating the distance: distCosine, distHaversine, distVincentySphere and distVincentyEllipsoid. The last one is considered the most accurate one (according to the package author).

library(geosphere)

A <- data.frame(Lat=1:4, Long=1:4, Name=c("a","b","c","d"))
B <- data.frame(Lat=5:8, Long=5:8, Name=c("a","b","c","d"))

A$distance <- distVincentyEllipsoid(A[,c('Long','Lat')], B[,c('Long','Lat')])

this gives:

> A
  Lat Long Name distance
1   1    1    a 627129.5
2   2    2    b 626801.7
3   3    3    c 626380.6
4   4    4    d 625866.6

Note that you have to include the lat/long columns in the order of first longitude and then latitude.


Although this works perfectly on this simple example, in larger datasets where the names are not in the same order, this will lead to problems. In that case you can use data.table and set the keys so you can match the points and calculate the distance (as @MichaelChirico did in his answer):

library(data.table)
A <- data.table(Lat=1:4, Long=1:4, Name=c("a","b","c","d"), key="Name")
B <- data.table(Lat=8:5, Long=8:5, Name=c("d","c","b","a"), key="Name")

A[B,distance:=distVincentyEllipsoid(A[,.(Long,Lat)], B[,.(Long,Lat)])]

as you can see, this gives the correct (i.e., the same) result as in the previous method:

> A
   Lat Long Name distance
1:   1    1    a 627129.5
2:   2    2    b 626801.7
3:   3    3    c 626380.6
4:   4    4    d 625866.6


To see what key="Name" does, compare the following two datatables:

B1 <- data.table(Lat=8:5, Long=8:5, Name=c("d","c","b","a"), key="Name")
B2 <- data.table(Lat=8:5, Long=8:5, Name=c("d","c","b","a"))


See also this answer for a more elaborate example.

这篇关于计算不同数据帧中的点之间的距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆