如何为经纬度观测指定几个名称 [英] How to assign several names to lat-lon observations

查看:93
本文介绍了如何为经纬度观测指定几个名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据框:df1包含具有经纬度坐标的观测值; df2具有带有经纬度坐标的名称.我想创建一个新的变量df1$names,该变量的每个观察值均具有df2的名称,该名称位于与该观察值指定距离之内.

I have two dataframes: df1 contains observations with lat-lon coordinates; df2 has names with lat-lon coordinates. I want to create a new variable df1$names which has for each observation the names of df2 that are within a specified distance to that observation.

df1的一些示例数据:

df1 <- structure(list(lat = c(52.768, 53.155, 53.238, 53.253, 53.312, 53.21, 53.21, 53.109, 53.376, 53.317, 52.972, 53.337, 53.208, 53.278, 53.316, 53.288, 53.341, 52.945, 53.317, 53.249), lon = c(6.873, 6.82, 6.81, 6.82, 6.84, 6.748, 6.743, 6.855, 6.742, 6.808, 6.588, 6.743, 6.752, 6.845, 6.638, 6.872, 6.713, 6.57, 6.735, 6.917), cat = c(2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 3L, 2L, 2L, 2L, 2L, 2L), diff = c(6.97305555555555, 3.39815972222222, 14.2874305555556, -0.759791666666667, 34.448275462963, 4.38783564814815, 0.142430555555556, 0.698599537037037, 1.22914351851852, 7.0008912037037, 1.3349537037037, 8.67978009259259, 1.6090162037037,    25.9466782407407, 9.45068287037037, 4.76284722222222, 1.79163194444444, 16.8280787037037, 1.01336805555556, 3.51240740740741)), .Names = c("lat", "lon", "cat", "diff"), row.names = c(125L, 705L, 435L, 682L, 186L, 783L, 250L, 517L, 547L, 369L, 618L, 280L, 839L, 614L, 371L, 786L, 542L, 100L, 667L, 785L), class = "data.frame")

df2的一些示例数据:

df2 <- structure(list(latlonloc = structure(c(6L, 3L, 4L, 2L, 5L, 1L), .Label = c("Boelenslaan", "Borgercompagnie", "Froombosch", "Garrelsweer", "Stitswerd", "Tinallinge"), class = "factor"), lat = c(53.356789, 53.193886, 53.311237, 53.111339, 53.360848, 53.162031), lon = c(6.53493, 6.780792, 6.768608, 6.82354, 6.599604, 6.143804)), .Names = c("latlonloc", "lat", "lon"), class = "data.frame", row.names = c(NA, -6L))

使用geosphere包创建距离矩阵:

library(geosphere)
mat <- distm(df1[,c('lon','lat')], df2[,c('lon','lat')], fun=distHaversine)

结果距离以米为单位(至少我认为是米,否则距离矩阵有问题).

The resulting distances are in meters (at least I think they are, else something is wrong with the distance matrix).

指定的距离用(df1$cat)^2)*1000计算.我尝试了df1$names <- df2$latlonloc[apply(distmat, 1, which(distmat < ((df1$cat)^2)*1000 ))],但收到错误消息:

The specified distance is calculated with (df1$cat)^2)*1000. I tried df1$names <- df2$latlonloc[apply(distmat, 1, which(distmat < ((df1$cat)^2)*1000 ))], but get an error message:

Error in match.fun(FUN) : 
  'which(distmat < ((df1$cat)^2) * 1000)' is not a function, character or symbol

这可能不是正确的方法,但是我需要的是:

This is probably not the correct appraoch, but what I need is this:

df1$names <- #code or function which gives me a string of names which are within a specified distance of the observation

如何创建一个字符串,其名称在观察值的指定距离之内?

How can I create a string with the names that are within a specified distance of the observations?

推荐答案

您需要对df1(或mat)的每一行进行操作,以便找出每一行中<df2是.从中,您可以选择满足距离标准的那些.

You need to operate on each row of df1 (or mat) in order to figure out, for each row how far away each object in df2 is. From that, you can pick the ones that meet your distance criterion.

我认为您对apply的使用和which的使用感到有些困惑.为了真正让which为您工作,您需要将其应用于mat的每一行,而当前代码将其应用于整个mat矩阵.还要注意,这里很难使用apply,因为您要将mat的每一行与((df1$cat)^2)*1000)定义的向量的对应元素进行比较.因此,我将向您展示使用sapplylapply的示例.您也可以在这里使用mapply,但是我认为sapply/mapply语法更清晰.

I think you're getting a little confused about the use of apply and about the use of which. To really have which work for you, you need to apply it to each row of mat whereas your current code applies it to the entire mat matrix. Also note that it is hard to use apply here because you're comparing each row of mat against a corresponding element of a vector defined by ((df1$cat)^2)*1000). So, I will instead show you examples using sapply and lapply. You could also use mapply here, but I think the sapply/mapply syntax is clearer.

为解决您所需的输出,我显示了两个示例.对于df1中的每一行,将返回一个列表,其中包含df2中距离阈值内的项目名称.由于列表中的每个元素都可以包含多个名称,因此这不会轻易地作为变量返回到原始df1中.第二个示例将这些名称作为单个逗号分隔的字符串粘贴在一起,以创建您要查找的新变量.

To address your desired output, I show two examples. One returns a list containing, for each row in df1, the names of items in df2 that are within the distance threshold. This won't easily go back into your original df1 as a variable because each element in the list can contain multiple names. The second example pastes those names together as a single comma-separated character string in order to create the new variable you're looking for.

示例1:

out1 <- lapply(1:nrow(df1), function(x) {
    df2[which(mat[x,] < (((df1$cat)^2)*1000)[x]),'latlonloc']
})

结果:

> str(out1)
List of 20
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 2
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 4
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 6 4 5
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 
 $ : Factor w/ 6 levels "Boelenslaan",..: 4
 $ : Factor w/ 6 levels "Boelenslaan",..: 

示例2:

out2 <- sapply(1:nrow(df1), function(x) {
    paste(df2[which(mat[x,] < (((df1$cat)^2)*1000)[x]),'latlonloc'], collapse=',')
})

结果:

> out2
 [1] ""                                 ""                                
 [3] ""                                 ""                                
 [5] ""                                 ""                                
 [7] ""                                 "Borgercompagnie"                 
 [9] ""                                 "Garrelsweer"                     
[11] ""                                 ""                                
[13] ""                                 ""                                
[15] "Tinallinge,Garrelsweer,Stitswerd" ""                                
[17] ""                                 ""                                
[19] "Garrelsweer"                      ""

我认为其中的第二个可能最接近您的目标.

I think the second of these is probably closest to what you're going for.

这篇关于如何为经纬度观测指定几个名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆