基于时间窗口内共现的子集 [英] Subsetting based on co-occurrence within a time window

查看:48
本文介绍了基于时间窗口内共现的子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法根据不同列中的不同属性对数据进行子集化.这是一个包含物种、发现区域和时间的虚拟数据集(已在 POSIXct 中).

I am having trouble subsetting data based on different attributes in different columns. Here is a dummy data set with species, area where it was found, and time (already in POSIXct).

SP Time Area
B 07:22 1
F 09:22 4
A 09:22 1
C 08:17 3
D 09:20 1
E 06:55 4
D 09:03 1
E 09:12 2
F 09:45 1
B 09:15 1

我需要在 +30 和 -30 分钟的时间窗口内对具有 SP==A 的行以及发生在同一区域的所有其他物种(在本例中为 1)进行子集化:

I need to subset the rows that have SP==A, plus all other species occurring in the same area (in this case 1), within a time window of +30 and -30 minutes returning this:

SP Time Area
A 09:22 1
D 09:20 1
D 09:03 1 
F 09:45 1
B 09:15 1

我无法通过这个 1 小时窗口的条件语句,我应该在这里使用 for 循环,还是有更简单的子集方法?非常感谢.

I can't get past the conditional statement of this 1-hour window, should I use a for loop here, or is there a simpler way of subsetting this? Many thanks in advance.

推荐答案

只用一个 A 值重现你的初始结果,假设你的数据叫做 dat,可以是这样做:

Reproducing just your initial result with one A value, assuming your data is called dat, can be done like so:

with(dat,dat[
  (
    SP=="A" |
    Area==Area[SP=="A"]
  ) &
  abs(difftime(Time,Time[SP=="A"],units="mins")) <= 30,
]
)

结果:

   SP                Time Area
3   A 2013-09-09 09:22:00    1
5   D 2013-09-09 09:20:00    1
7   D 2013-09-09 09:03:00    1
9   F 2013-09-09 09:45:00    1
10  B 2013-09-09 09:15:00    1

考虑到多次出现 A,事情变得更加复杂:

To account for multiple occurrences of A, things get a touch more complex:

with(dat,dat[
  (
    SP=="A" |
    Area %in% Area[SP=="A"]
  ) & 
  apply(
    sapply(Time[SP=="A"],
    function(x) abs(difftime(Time,x,units="mins"))<=30 ),1,any
  )
,]
)

虽然我确信这里的某个地方可能会进行简化.

Though I'm sure there is probably a simplification possible here somewhere.

这篇关于基于时间窗口内共现的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆