按 POSIXct 间隔和另一个包含间隔的字段的条件子集 [英] Conditional subsetting by POSIXct interval and another field containing interval

查看:20
本文介绍了按 POSIXct 间隔和另一个包含间隔的字段的条件子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个数据集 Dat,其中我有物种 (SP)、面积 (AR) 和时间 (TM)(在 POSIXct 中).我想对存在于物种 A 的个体的数据进行子集化,在记录之前和之后的半小时内,以及在同一区域内,包括两个相邻区域(+ 和 - 1).例如,如果物种 A 在 1:00 出现在区域 4,我希望将同一天 12:30 到 1:30 出现在区域 3,4 和 5 的所有物种进行子集化.例如:

Given a dataset Dat where I have species (SP), Area (AR), and Time (TM) (in POSIXct). I want to subset the data for individuals that were present with Species A, within a half hour prior and after it was recorded, and within the same area, including two adjacent areas (+ and - 1). For example, if species A was present at 1:00 on area 4, I wish to subset all species present from 12:30 to 1:30 in the same day in areas 3,4 and 5. As an example:

SP         TM      AR
B  1-jan-03 07:22  1
F  1-jan-03 09:22  4
A  1-jan-03 09:22  1
C  1-jan-03 08:17  3
D  1-jan-03 09:20  1
E  1-jan-03 06:55  4
D  1-jan-03 09:03  1
E  1-jan-03 09:12  2
F  1-jan-03 09:45  1
B  3-jan-03 09:15  1
A  3-jan-03 10:30  5
F  3-jan-03 07:30  5
F  3-jan-03 10:20  6
D  3-jan-03 10:05  4

这个虚拟表的期望结果是:

The desired result for this dummy table would be:

SP         TM      AR
A  1-jan-03 09:22  1
D  1-jan-03 09:20  1
D  1-jan-03 09:03  1
E  1-jan-03 09:12  2
F  1-jan-03 09:45  1
A  3-jan-03 10:30  5
F  3-jan-03 10:20  6
D  3-jan-03 10:05  4 

注意:物种 A 在任何给定时间从 1 到 81 只蚂蚁的任何给定区域在整个数据集中重复出现.在之前的一组帖子中,我将这个问题一分为二,所以我可以学习如何集成代码,但是我对问题的说明有缺陷.非常感谢提供有用答案的用户 ThelatemailJason.基于共现的子集在一个时间窗口内子集相邻字段反馈是:

Note: Species A appears repeatedly throughout the dataset in any given area ranging from 1-81 ant any given time. On a previous set of post, I broke this question in two, so I could learn how to integrate the codes, but my specifications for the problem were flawed. Many thanks to the users Thelatemail and Jason who provided helpful answers. Subsetting based on co-occurrence within a time window Subsetting neighboring fileds The feedback was:

with(dat,dat[
(
SP=="A" |
Area %in% c(Area[SP=='A']-1, Area[SP=='A'], Area[SP=='A']+1)
) & 
apply(
sapply(Time[SP=="A"],
function(x) abs(difftime(Time,x,units="mins"))<=30 ),1,any
) 
,]
)

然而,这部分工作,它只在时间窗口内进行子集,而不是按区域.我认为这是由 POSIXct 和使用子集命令的问题引起的,因为时间窗口中包含不同的时间.是否需要另一个应用函数来分隔该区域间隔?非常感谢任何帮助

Which worked partially, however, it only subsets within the time window, not by area. I think it is caused by issues with POSIXct and using the subset commands, since different times are included in a time window. Would another apply function be necessary for separating that area interval? Any help is much appreciated

推荐答案

一个非常受@thelatemail's@贾斯汀 之前的好答案,但这会考虑空间布尔表达式中的时间(请参阅我对这个问题的评论).

A possible solution very much inspired by @thelatemail's and @Justin's previous, nice answers, but this accounts for time in the boolean expression for space (see my comments to this question).

使用sapply,我们循环"每次注册物种A(time[SP == "A"]),并创建一个布尔矩阵mm 每个注册 A 一列.每一行代表针对给定 A 注册的每个注册的空间和时间测试.

Using sapply, we 'loop' over each time of registration of Species A (time[SP == "A"]), and create a boolean matrix mm with one column per registration of A. Each row represents a test for space and time for each registration against a given registration of A.

mm <- with(dat,
           sapply(time[SP == "A"], function(x)
             abs(AR - AR[SP == "A" & time == x]) <= 1 &
                    abs(difftime(time, x, units = "mins")) <= 30))

# select rows from data where at least one column in mm is TRUE    
dat[rowSums(mm) > 0, ]

# SP                time AR
# 3   A 2003-01-01 09:22:00  1
# 5   D 2003-01-01 09:20:00  1
# 7   D 2003-01-01 09:03:00  1
# 8   E 2003-01-01 09:12:00  2
# 9   F 2003-01-01 09:45:00  1
# 11  A 2003-01-03 10:30:00  5
# 13  F 2003-01-03 10:20:00  6
# 14  D 2003-01-03 10:05:00  4

这篇关于按 POSIXct 间隔和另一个包含间隔的字段的条件子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆