通过POSIXct间隔和另一个包含间隔的字段进行条件子集 [英] Conditional subsetting by POSIXct interval and another field containing interval

查看:83
本文介绍了通过POSIXct间隔和另一个包含间隔的字段进行条件子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出一个数据集Dat,其中我有物种(SP),区域(AR)和时间(TM)(以POSIXct表示). 我想在记录之前和之后的半小时内,以及在同一区域内,包括两个相邻区域(+和-1)中,对物种A中存在的个体的数据进行子集化.例如,如果物种A出现在区域4的1:00,我希望将同一天在区域3,4和5中从12:30到1:30出现的所有物种进行子集化.例如:

Given a dataset Dat where I have species (SP), Area (AR), and Time (TM) (in POSIXct). I want to subset the data for individuals that were present with Species A, within a half hour prior and after it was recorded, and within the same area, including two adjacent areas (+ and - 1). For example, if species A was present at 1:00 on area 4, I wish to subset all species present from 12:30 to 1:30 in the same day in areas 3,4 and 5. As an example:

SP         TM      AR
B  1-jan-03 07:22  1
F  1-jan-03 09:22  4
A  1-jan-03 09:22  1
C  1-jan-03 08:17  3
D  1-jan-03 09:20  1
E  1-jan-03 06:55  4
D  1-jan-03 09:03  1
E  1-jan-03 09:12  2
F  1-jan-03 09:45  1
B  3-jan-03 09:15  1
A  3-jan-03 10:30  5
F  3-jan-03 07:30  5
F  3-jan-03 10:20  6
D  3-jan-03 10:05  4

此虚拟表的预期结果将是:

The desired result for this dummy table would be:

SP         TM      AR
A  1-jan-03 09:22  1
D  1-jan-03 09:20  1
D  1-jan-03 09:03  1
E  1-jan-03 09:12  2
F  1-jan-03 09:45  1
A  3-jan-03 10:30  5
F  3-jan-03 10:20  6
D  3-jan-03 10:05  4 

注意:物种A在整个数据集中在任何给定时间内在1-81到ant范围内的任何给定区域中反复出现. 在上一篇文章中,我将这个问题一分为二,因此我可以学习如何集成代码,但是我对该问题的说明存在缺陷.非常感谢用户 Thelatemail Jason 提供了有用的答案. 基于共现的子集在一个时间范围内 子集邻居文件 反馈是:

Note: Species A appears repeatedly throughout the dataset in any given area ranging from 1-81 ant any given time. On a previous set of post, I broke this question in two, so I could learn how to integrate the codes, but my specifications for the problem were flawed. Many thanks to the users Thelatemail and Jason who provided helpful answers. Subsetting based on co-occurrence within a time window Subsetting neighboring fileds The feedback was:

with(dat,dat[
(
SP=="A" |
Area %in% c(Area[SP=='A']-1, Area[SP=='A'], Area[SP=='A']+1)
) & 
apply(
sapply(Time[SP=="A"],
function(x) abs(difftime(Time,x,units="mins"))<=30 ),1,any
) 
,]
)

这部分起作用,但是,它仅是时间窗口内的子集,而不是按区域.我认为这是由于POSIXct和使用subset命令的问题引起的,因为时间窗口中包含了不同的时间.分隔该区域间隔是否需要另一个应用功能?任何帮助都非常感激

Which worked partially, however, it only subsets within the time window, not by area. I think it is caused by issues with POSIXct and using the subset commands, since different times are included in a time window. Would another apply function be necessary for separating that area interval? Any help is much appreciated

推荐答案

一种可能的解决方案,受到 @贾斯汀的先前的答案很不错,但这是在布尔表达式中表示时间的原因(请参阅我对这个问题的评论).

A possible solution very much inspired by @thelatemail's and @Justin's previous, nice answers, but this accounts for time in the boolean expression for space (see my comments to this question).

使用sapply,我们在每次物种A(time[SP == "A"])注册时循环",并创建布尔矩阵mm,每次注册A时都有一列.每一行代表对空间和给定A的每次注册的时间.

Using sapply, we 'loop' over each time of registration of Species A (time[SP == "A"]), and create a boolean matrix mm with one column per registration of A. Each row represents a test for space and time for each registration against a given registration of A.

mm <- with(dat,
           sapply(time[SP == "A"], function(x)
             abs(AR - AR[SP == "A" & time == x]) <= 1 &
                    abs(difftime(time, x, units = "mins")) <= 30))

# select rows from data where at least one column in mm is TRUE    
dat[rowSums(mm) > 0, ]

# SP                time AR
# 3   A 2003-01-01 09:22:00  1
# 5   D 2003-01-01 09:20:00  1
# 7   D 2003-01-01 09:03:00  1
# 8   E 2003-01-01 09:12:00  2
# 9   F 2003-01-01 09:45:00  1
# 11  A 2003-01-03 10:30:00  5
# 13  F 2003-01-03 10:20:00  6
# 14  D 2003-01-03 10:05:00  4

这篇关于通过POSIXct间隔和另一个包含间隔的字段进行条件子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆