查找从数据框到特定位置的最近城市 [英] Find nearest cities from the data frame to the specific location

查看:11
本文介绍了查找从数据框到特定位置的最近城市的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面的DataFrame包含有关LATE、LONG、STORY和CITY的信息。我想要找到 数据框中给出的每个城市对应的三个最近城市。例如,从下面的 离阿尔伯克基最近的DataFrame、俄克拉何马城和科罗拉多州斯普林德,所以离阿尔伯克基最近的三个城市应该是 保存在名为NEAREST_AL的其他数据帧中(我不知道如何获得这个结果,所以我试图通过创建数据框来给出一个想法)。

dataframe<-data.frame(long=c("-106.61291","-81.97224","-84.42770","-72.68604","-97.60056","-104.70261"),
  lat=c("35.04333","33.37378","33.64073","41.93887","35.39305","38.80171"),
  state=c("NM","GA","GA","TX","OK","CO"),
  city=c("Albuquerque","Augusta","Atlanta","Windsor Locks","Oklahoma City","Colarado Springs")
)

nearest_Al<-data.frame(long=c("-97.60056","-104.70261"),
                      lat=c("35.39305","38.80171"),
                      state=c("OK","CO"),
                      city=c("Oklahoma City","Colarado Springs")
)

我必须对包含500k行和大约100个位置的数据帧执行同样的操作。

提前谢谢!

推荐答案

这里有一个想法。dataframe2是最终输出。Near_City列显示city列中每个城市距离最近的前三个城市。

library(dplyr)
library(sp)
library(rgdal)
library(sf)

# Create example data frame
dataframe<-data.frame(long=c("-106.61291","-81.97224","-84.42770","-72.68604","-97.60056","-104.70261"),
                      lat=c("35.04333","33.37378","33.64073","41.93887","35.39305","38.80171"),
                      state=c("NM","GA","GA","TX","OK","CO"),
                      city=c("Albuquerque","Augusta","Atlanta","Windsor Locks","Oklahoma City","Colarado Springs"),
                      stringsAsFactors = FALSE
)

# Create spatial point data frame object
dataframe_sp <- dataframe %>%
  mutate(long = as.numeric(long), lat = as.numeric(lat))
coordinates(dataframe_sp) <- ~long + lat

# Convert to sf object
dataframe_sf <- st_as_sf(dataframe_sp)

# Set projection
st_crs(dataframe_sf) <- 4326

# Calculate the distance
dist_m <- st_distance(dataframe_sf, dataframe_sf)

# Select the closet three cities
# Remove the first row, and then select the first three rows
index <- apply(dist_m, 1, order)
index <- index[2:nrow(index), ]
index <- index[1:3, ]

# Rep each city by three
dataframe2 <- dataframe[rep(1:nrow(dataframe), each = 3), ]

# Process the dataframe based on index, store the results in Near_City column
dataframe2$Near_City <- dataframe[as.vector(index), ]$city

更新

我们可以进一步创建OP需要的输出。

dataframe3 <- dataframe[as.vector(index), ]
dataframe3$TargetCity <- dataframe2$city

nearest_city_list <- split(dataframe3, f = dataframe3$TargetCity)
现在,每个"目标城市"都是列表nearest_city_list上的一个元素。要访问数据,我们可以使用目标城市名称访问列表元素。下面是一个获取阿尔伯克基搜索结果的示例:

nearest_city_list[["Albuquerque"]]
        long      lat state             city  TargetCity
6 -104.70261 38.80171    CO Colarado Springs Albuquerque
5  -97.60056 35.39305    OK    Oklahoma City Albuquerque
3  -84.42770 33.64073    GA          Atlanta Albuquerque

这篇关于查找从数据框到特定位置的最近城市的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆