按组在第一次发生事件之前选择行 [英] Select row prior to first occurrence of an event by group
问题描述
我有一系列观察结果,描述了是否以及何时在特定区域发现动物。下面的示例表确定何时看到某种动物( status == 1
)或否( status == 0
I have a series of observations that describe if and when an animal is spotted in a specific area. The following sample table identifies when a certain animal is seen (status == 1
) or not (status == 0
) by day.
id date status
1 1 2014-06-20 1
2 1 2014-06-21 1
3 1 2014-06-22 1
4 1 2014-06-23 1
5 1 2014-06-24 0
6 2 2014-06-20 1
7 2 2014-06-21 1
8 2 2014-06-22 0
9 2 2014-06-23 1
10 2 2014-06-24 1
11 3 2014-06-20 1
12 3 2014-06-21 1
13 3 2014-06-22 0
14 3 2014-06-23 1
15 3 2014-06-24 0
16 4 2014-06-20 1
17 4 2014-06-21 0
18 4 2014-06-22 0
19 4 2014-06-23 0
20 4 2014-06-24 1
使用 data.table
包,我可以确定该地区不再存在动物的第一天:
Using the data.table
package, I can identify the first day an animal is no longer seen in the area:
library(data.table)
dt <- as.data.table(df)
dt[status == 0, .SD[1], by = id]
id date status
1: 1 2014-06-24 0
2: 2 2014-06-22 0
3: 3 2014-06-22 0
4: 4 2014-06-21 0
尽管上表很有用,但我想知道如何操作该函数以查找首次出现动物失踪之前的日期。换句话说,我想知道每只动物在暂时离开前的最后一天。
While the above table is useful, I would like to know how to manipulate the function to find the dates prior to first occurrence of an animal's absence. In other words, I want to know the last day that each animal is in the area before temporarily leaving.
我的实际数据集根据情况将这些存在/不存在观察分为不同的时间长度(例如,按3小时间隔,6小时等存在/不存在) 。因此,访问前一行比从每个值中减去时间间隔要容易得多,因为它总是在变化。我期望的输出如下:
My actual data set bins these presence/absence observations into different time lengths depending on the situation (e.g. presence/absence by 3-hour intervals, 6-hour, etc). Therefore, it would be easier to access the previous row rather than subtract the time interval from each value since it always changes. My desired output would be the following:
id date status
1: 1 2014-06-23 1
2: 2 2014-06-21 1
3: 3 2014-06-21 1
4: 4 2014-06-20 1
请随时使用 base
代码或其他软件包(即 dplyr
)来回答这个问题,我总是在寻求新的东西。谢谢您的时间!
Please feel free to use base
code or other packages (i.e. dplyr
) to answer this question, I am always up for something new. Thank you for your time!
推荐答案
请尝试以下操作:
dt[dt[status == 0, .I[1] - 1, by = id]$V1]
# id date status
#1: 1 2014-06-23 1
#2: 2 2014-06-21 1
#3: 3 2014-06-21 1
#4: 4 2014-06-20 1
这种方法(使用 .I
代替 .SD
)也将更快。有关更多信息,请参见这篇文章。
Incidentally, this method (using .I
instead of .SD
) will also be much faster. See this post for more on that.
这篇关于按组在第一次发生事件之前选择行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!