基于R中最接近的LAT_LON的left_join [英] left_join based on closest LAT_LON in R

查看:125
本文介绍了基于R中最接近的LAT_LON的left_join的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试参考原始data.frame在data.frame中找到最接近的LAT_LON的ID。我已经通过将两个data.frames合并到一个唯一的标识符上并根据 geosphere distHaverSine 函数计算距离来解决这个问题c $ c>。现在,我想更进一步,将没有唯一标识符的data.frames加入,并找到最接近LAT-LON的ID。
合并后我使用了以下代码:

I am trying to find the ID of the closest LAT_LON in a data.frame with reference to my original data.frame. I have already figured this out by merging both data.frames on a unique identifier and the calculating the distance based on the distHaverSine function from geosphere. Now, I want to take step further and join the data.frames without the unique identifier and find ID the nearest LAT-LON. I have used the following code after merging:

v3< -v2%>%mutate(CTD = distHaversine(cbind (LON.x,LAT.x),cbind(LON.y,LAT.y)))

数据:

loc <- data.frame(station = c('Baker Street','Bank'),
     lat = c(51.522236,51.5134047),
     lng = c(-0.157080, -0.08905843),
               postcode = c('NW1','EC3V'))



stop <- data.frame(station = c('Angel','Barbican','Barons Court','Bayswater'),
                lat = c(51.53253,51.520865,51.490281,51.51224),
                lng = c(-0.10579,-0.097758,-0.214340,-0.187569),
                postcode = c('EC1V','EC1A', 'W14', 'W2'))

最终结果是我想要这样的东西:

As a final result I would like something like this:

df <- data.frame(loc = c('Baker Street','Bank','Baker Street','Bank','Baker Street','Bank','Baker 
        Street','Bank'), 
              stop = c('Angel','Barbican','Barons Court','Bayswater','Angel','Barbican','Barons Court','Bayswater'), 
              dist = c('x','x','x','x','x','x','x','x'), 
              lat = c(51.53253,51.520865,51.490281,51.51224,51.53253,51.520865,51.490281,51.51224), 
              lng = c(-0.10579,-0.097758,-0.214340,-0.187569,-0.10579,-0.097758,-0.214340,-0.187569),
              postcode = c('EC1V','EC1A', 'W14', 'W2','EC1V','EC1A', 'W14', 'W2')
              )

感谢您的帮助。谢谢。

推荐答案

由于对象之间的距离很小,我们可以通过使用欧几里得距离来加快计算
的速度。坐标之间。由于我们不在赤道
附近,因此lng坐标被压缩了一些;我们可以通过稍微扩展lng来使比较
稍微好一些。

As the distances between the object are small we can speed up the computation by using the euclidian distance between the coordinates. As we are not around the equator, the lng coordinates are squished a bit; we can make the comparison slightly better by scaling the lng a bit.

cor_stop <- stop[, c("lat", "lng")]
cor_stop$lng <- cor_stop$lng * sin(mean(cor_stop$lat, na.rm = TRUE)/180*pi)
cor_loc <- loc[, c("lat", "lng")]
cor_loc$lng <- cor_loc$lng * sin(mean(cor_loc$lat, na.rm = TRUE)/180*pi)

然后我们可以使用 FNN 包来计算每个位置的最近停靠点,该包使用基于树的搜索来快速找到最近的K邻居。这应该扩展到大数据集(我已将其用于具有数百万条记录的数据集):

We can then calculate the closest stop for each location using the FNN package which uses tree based search to quickly find the closest K neighbours. This should scale to big data sets (I have used this for datasets with millions of records):

library(FNN)
matches <- knnx.index(cor_stop, cor_loc, k = 1)
matches



##      [,1]
## [1,]    4
## [2,]    2

然后我们可以构建最终结果:

We can then construct the end result:

res <- loc
res$stop_station  <- stop$station[matches[,1]]
res$stop_lat      <- stop$lat[matches[,1]]
res$stop_lng      <- stop$lng[matches[,1]]
res$stop_postcode <- stop$postcode[matches[,1]]

并计算实际距离:

library(geosphere)
res$dist <- distHaversine(res[, c("lng", "lat")], res[, c("stop_lng", "stop_lat")])
res



##          station      lat         lng postcode stop_station stop_lat  stop_lng
## 1 Baker Street 51.52224 -0.15708000      NW1    Bayswater 51.51224 -0.187569
## 2         Bank 51.51340 -0.08905843     EC3V     Barbican 51.52087 -0.097758
##   stop_postcode     dist
## 1            W2 2387.231
## 2          EC1A 1026.091

我不确定您最近的点lat-long也是随着鸟儿飞翔的最近点,您可以使用此方法首先选择lat-long中的K个最近点;然后计算这些点的距离,然后选择最接近的点。

I you are unsure that the closest point in lat-long is also the closest point 'as the bird flies', you could use this method to first select the K closest points in lat-long; then calculate the distances for those points and then selecting the closest point.

这篇关于基于R中最接近的LAT_LON的left_join的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆