如果/否则:仅在R中不满足第一个条件后,才在设置的距离内选择第一个匹配记录 [英] If/else if: pick first matching record within set distance only after first condition is not met in R

查看:136
本文介绍了如果/否则:仅在R中不满足第一个条件后,才在设置的距离内选择第一个匹配记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想仅在不满足第一个搜索条件后,才在设定的距离内选择最近的所有者。这些位置称为 reflo (参考位置),它们具有相应的x和y坐标(称为 locx locy )。

I would like to pick the closest previous owner within a set distance only after the first search condition isn't met. The locations are called reflo (reference location), and they have a corresponding x and y coordinates (called locx and locy, respectively).

条件:


  • 如果 lifetime_census $ reflo == owners $ reflo.x [i] ,则满足条件

  • 如果 lifetime_census $ reflo!= owners $ reflo.x [i] ,则查找下一个最近的记录(30米以内)

  • 如果30米之内没有记录,则分配 NA

  • if lifetime_census$reflo==owners$reflo.x[i] then condition is met
  • if lifetime_census$reflo!=owners$reflo.x[i], then find next closest record (within 30 meters)
  • if there is no record within 30 meters, then assign NA

以前的所有者(> 20,000)存储在名为 lifetime_census 的数据集中。以下是数据示例:

Previous owners (>20,000) are stored in a dataset called lifetime_census. Here is a sample of the data:

id         previous_id  reflo  locx    locy   lifespan  
16161      5587         -310    -3     10     1810    
16848      5101         Q1      17.3   0.8    55    
21815      6077         M2      13     1.8    979
23938      6130         -49     -4     9      374
29615      7307         B.1     2.5    1      1130

然后我有一个所有者数据集(这里是一个示例):

I then have an owners dataset (here is a sample):

squirrel_id      spr_census reflo.x    spring_locx      spring_locy 
6391              2005       M3           13             2.5  
6130              2005       -310         -3             10    
23586             2019       B9           2              9

为说明我要实现的目标:

squirrel_id spr_census reflo.x spring_locx spring_locy previous_owner   
6391        2004       M3       13         2.5         6077            
6130        2005       -310     -3         10          5587   
23586       2019       B9       2          9           NA

我目前正在尝试的是这样:

What I have currently tried is this:

n <- length(owners$squirrel_id)
distance <- 30 #This can be easily changed to bigger or smaller values

for(i in 1:n) {
  last_owner <- subset(lifetime_census,
    lifetime_census$reflo==owners$reflo.x[i] & #using the exact location
((30*owners$spring_locx[i]-30* lifetime_census$locx)^2+(30* owners$spring_locy[i]-30* lifetime_census$locy)^2<=(distance)^2)) #this sets the search limit

owners[i,"previous_owner"] <- last_owner$previous_id[i]

}

我无法弄清楚循环如何依次处理条件,然后选择

有什么想法吗?

推荐答案

我建议这样(假设 locx 等单位与距离

I would suggest something like this (asumming the units for locx and alike are the same as for distance:

distance = 30

distance_xy = function (x1, y1, x2, y2) {
  sqrt((x2 - x1)^2 + (y2 -y1)^2)
}

for (i in 1:dim(owners)[1]) {
  if (owners$reflo.x[i] %in% lifetime_census$reflo) {
    owners$previous_owner[i] = lifetime_census[lifetime_census$reflo == owners$reflo.x[i], ]$previous_id
  } else {
    dt = distance_xy(owners$spring_locx[i], owners$spring_locy[i], lifetime_census$locx, lifetime_census$locy)
      if (any(dt <= distance)) {
        owners$previous_owner[i] = lifetime_census[order(dt), ]$previous_id[1L]
      } else {
        owners$previous_id[i] = NA
      }
    }
  }

给出:

   squirrel_id spr_census reflo.x spring_locx spring_locy previous_owner
1        6391       2005      M3          13         2.5           6077
2        6130       2005    -310          -3        10.0           5587
3       23586       2019      B9           2         9.0           5587

请注意,如果 reflo 有多个匹配项,则此操作将失败。

Note that this will fail if there are more than one match for reflo.

根据以下评论添加替代项。

Adding an alternative based on comment below.

if -< else 语句会变得非常混乱。这是避免出现上述嵌套结构的另一种方法:

if-else statements can get pretty confusing when you start adding conditions. This is another way of achieving the same while avoiding the nested structure above:

for (i in 1:dim(owners)[1]) {

  # if we find the reflo
  if (owners$reflo.x[i] %in% lifetime_census$reflo) {
    owners$previous_owner[i] = lifetime_census[lifetime_census$reflo == owners$reflo.x[i], ]$previous_id
    next
  }

  # if we got here, then we didn't find the reflo, compute distances:
  dt = distance_xy(owners$spring_locx[i], owners$spring_locy[i], lifetime_census$locx, lifetime_census$locy)

  # if we find anyone within distance, get the closest one
  if (any(dt <= distance)) {
    owners$previous_owner[i] = lifetime_census[order(dt), ]$previous_id[1L]
    next
  }

  # if we got here, there was nobody within range, set NA and move on:
  owners$previous_id[i] = NA
}

代码的功能完全相同,只是利用了 for 循环和 next ,可以删除每个 else 并嵌套的孔结构体。

The code does exactly the same, but by taking advantage of the for loop and next it is possible to remove every else and the hole nested structure.

这篇关于如果/否则:仅在R中不满足第一个条件后,才在设置的距离内选择第一个匹配记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆