如何为经纬度观测指定几个名称 [英] How to assign several names to lat-lon observations
问题描述
我有两个数据框:df1
包含具有经纬度坐标的观测值; df2
具有带有经纬度坐标的名称.我想创建一个新的变量df1$names
,该变量的每个观察值均具有df2
的名称,该名称位于与该观察值指定距离之内.
I have two dataframes: df1
contains observations with lat-lon coordinates; df2
has names with lat-lon coordinates. I want to create a new variable df1$names
which has for each observation the names of df2
that are within a specified distance to that observation.
df1
的一些示例数据:
df1 <- structure(list(lat = c(52.768, 53.155, 53.238, 53.253, 53.312, 53.21, 53.21, 53.109, 53.376, 53.317, 52.972, 53.337, 53.208, 53.278, 53.316, 53.288, 53.341, 52.945, 53.317, 53.249), lon = c(6.873, 6.82, 6.81, 6.82, 6.84, 6.748, 6.743, 6.855, 6.742, 6.808, 6.588, 6.743, 6.752, 6.845, 6.638, 6.872, 6.713, 6.57, 6.735, 6.917), cat = c(2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 3L, 2L, 2L, 2L, 2L, 2L), diff = c(6.97305555555555, 3.39815972222222, 14.2874305555556, -0.759791666666667, 34.448275462963, 4.38783564814815, 0.142430555555556, 0.698599537037037, 1.22914351851852, 7.0008912037037, 1.3349537037037, 8.67978009259259, 1.6090162037037, 25.9466782407407, 9.45068287037037, 4.76284722222222, 1.79163194444444, 16.8280787037037, 1.01336805555556, 3.51240740740741)), .Names = c("lat", "lon", "cat", "diff"), row.names = c(125L, 705L, 435L, 682L, 186L, 783L, 250L, 517L, 547L, 369L, 618L, 280L, 839L, 614L, 371L, 786L, 542L, 100L, 667L, 785L), class = "data.frame")
df2
的一些示例数据:
df2 <- structure(list(latlonloc = structure(c(6L, 3L, 4L, 2L, 5L, 1L), .Label = c("Boelenslaan", "Borgercompagnie", "Froombosch", "Garrelsweer", "Stitswerd", "Tinallinge"), class = "factor"), lat = c(53.356789, 53.193886, 53.311237, 53.111339, 53.360848, 53.162031), lon = c(6.53493, 6.780792, 6.768608, 6.82354, 6.599604, 6.143804)), .Names = c("latlonloc", "lat", "lon"), class = "data.frame", row.names = c(NA, -6L))
使用geosphere
包创建距离矩阵:
library(geosphere)
mat <- distm(df1[,c('lon','lat')], df2[,c('lon','lat')], fun=distHaversine)
结果距离以米为单位(至少我认为是米,否则距离矩阵有问题).
The resulting distances are in meters (at least I think they are, else something is wrong with the distance matrix).
指定的距离用(df1$cat)^2)*1000
计算.我尝试了df1$names <- df2$latlonloc[apply(distmat, 1, which(distmat < ((df1$cat)^2)*1000 ))]
,但收到错误消息:
The specified distance is calculated with (df1$cat)^2)*1000
. I tried df1$names <- df2$latlonloc[apply(distmat, 1, which(distmat < ((df1$cat)^2)*1000 ))]
, but get an error message:
Error in match.fun(FUN) :
'which(distmat < ((df1$cat)^2) * 1000)' is not a function, character or symbol
这可能不是正确的方法,但是我需要的是:
This is probably not the correct appraoch, but what I need is this:
df1$names <- #code or function which gives me a string of names which are within a specified distance of the observation
如何创建一个字符串,其名称在观察值的指定距离之内?
How can I create a string with the names that are within a specified distance of the observations?
推荐答案
您需要对df1
(或mat
)的每一行进行操作,以便找出每一行中<df2
是.从中,您可以选择满足距离标准的那些.
You need to operate on each row of df1
(or mat
) in order to figure out, for each row how far away each object in df2
is. From that, you can pick the ones that meet your distance criterion.
我认为您对apply
的使用和which
的使用感到有些困惑.为了真正让which
为您工作,您需要将其应用于mat
的每一行,而当前代码将其应用于整个mat
矩阵.还要注意,这里很难使用apply
,因为您要将mat
的每一行与((df1$cat)^2)*1000)
定义的向量的对应元素进行比较.因此,我将向您展示使用sapply
和lapply
的示例.您也可以在这里使用mapply
,但是我认为sapply
/mapply
语法更清晰.
I think you're getting a little confused about the use of apply
and about the use of which
. To really have which
work for you, you need to apply it to each row of mat
whereas your current code applies it to the entire mat
matrix. Also note that it is hard to use apply
here because you're comparing each row of mat
against a corresponding element of a vector defined by ((df1$cat)^2)*1000)
. So, I will instead show you examples using sapply
and lapply
. You could also use mapply
here, but I think the sapply
/mapply
syntax is clearer.
为解决您所需的输出,我显示了两个示例.对于df1
中的每一行,将返回一个列表,其中包含df2
中距离阈值内的项目名称.由于列表中的每个元素都可以包含多个名称,因此这不会轻易地作为变量返回到原始df1
中.第二个示例将这些名称作为单个逗号分隔的字符串粘贴在一起,以创建您要查找的新变量.
To address your desired output, I show two examples. One returns a list containing, for each row in df1
, the names of items in df2
that are within the distance threshold. This won't easily go back into your original df1
as a variable because each element in the list can contain multiple names. The second example pastes those names together as a single comma-separated character string in order to create the new variable you're looking for.
示例1:
out1 <- lapply(1:nrow(df1), function(x) {
df2[which(mat[x,] < (((df1$cat)^2)*1000)[x]),'latlonloc']
})
结果:
> str(out1)
List of 20
$ : Factor w/ 6 levels "Boelenslaan",..:
$ : Factor w/ 6 levels "Boelenslaan",..:
$ : Factor w/ 6 levels "Boelenslaan",..:
$ : Factor w/ 6 levels "Boelenslaan",..:
$ : Factor w/ 6 levels "Boelenslaan",..:
$ : Factor w/ 6 levels "Boelenslaan",..:
$ : Factor w/ 6 levels "Boelenslaan",..:
$ : Factor w/ 6 levels "Boelenslaan",..: 2
$ : Factor w/ 6 levels "Boelenslaan",..:
$ : Factor w/ 6 levels "Boelenslaan",..: 4
$ : Factor w/ 6 levels "Boelenslaan",..:
$ : Factor w/ 6 levels "Boelenslaan",..:
$ : Factor w/ 6 levels "Boelenslaan",..:
$ : Factor w/ 6 levels "Boelenslaan",..:
$ : Factor w/ 6 levels "Boelenslaan",..: 6 4 5
$ : Factor w/ 6 levels "Boelenslaan",..:
$ : Factor w/ 6 levels "Boelenslaan",..:
$ : Factor w/ 6 levels "Boelenslaan",..:
$ : Factor w/ 6 levels "Boelenslaan",..: 4
$ : Factor w/ 6 levels "Boelenslaan",..:
示例2:
out2 <- sapply(1:nrow(df1), function(x) {
paste(df2[which(mat[x,] < (((df1$cat)^2)*1000)[x]),'latlonloc'], collapse=',')
})
结果:
> out2
[1] "" ""
[3] "" ""
[5] "" ""
[7] "" "Borgercompagnie"
[9] "" "Garrelsweer"
[11] "" ""
[13] "" ""
[15] "Tinallinge,Garrelsweer,Stitswerd" ""
[17] "" ""
[19] "Garrelsweer" ""
我认为其中的第二个可能最接近您的目标.
I think the second of these is probably closest to what you're going for.
这篇关于如何为经纬度观测指定几个名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!