按组在第一次发生事件之前选择行 [英] Select row prior to first occurrence of an event by group

查看:99
本文介绍了按组在第一次发生事件之前选择行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一系列观察结果,描述了是否以及何时在特定区域发现动物。下面的示例表确定何时看到某种动物( status == 1 )或否( status == 0

I have a series of observations that describe if and when an animal is spotted in a specific area. The following sample table identifies when a certain animal is seen (status == 1) or not (status == 0) by day.

   id       date status
1   1 2014-06-20      1
2   1 2014-06-21      1
3   1 2014-06-22      1
4   1 2014-06-23      1
5   1 2014-06-24      0
6   2 2014-06-20      1
7   2 2014-06-21      1
8   2 2014-06-22      0
9   2 2014-06-23      1
10  2 2014-06-24      1
11  3 2014-06-20      1
12  3 2014-06-21      1
13  3 2014-06-22      0
14  3 2014-06-23      1
15  3 2014-06-24      0
16  4 2014-06-20      1
17  4 2014-06-21      0
18  4 2014-06-22      0
19  4 2014-06-23      0
20  4 2014-06-24      1

使用 data.table 包,我可以确定该地区不再存在动物的第一天:

Using the data.table package, I can identify the first day an animal is no longer seen in the area:

library(data.table)
dt <- as.data.table(df)
dt[status == 0, .SD[1], by = id]
  id       date status
1:  1 2014-06-24      0
2:  2 2014-06-22      0
3:  3 2014-06-22      0
4:  4 2014-06-21      0

尽管上表很有用,但我想知道如何操作该函数以查找首次出现动物失踪之前的日期。换句话说,我想知道每只动物在暂时离开前的最后一天。

While the above table is useful, I would like to know how to manipulate the function to find the dates prior to first occurrence of an animal's absence. In other words, I want to know the last day that each animal is in the area before temporarily leaving.

我的实际数据集根据情况将这些存在/不存在观察分为不同的时间长度(例如,按3小时间隔,6小时等存在/不存在) 。因此,访问前一行比从每个值中减去时间间隔要容易得多,因为它总是在变化。我期望的输出如下:

My actual data set bins these presence/absence observations into different time lengths depending on the situation (e.g. presence/absence by 3-hour intervals, 6-hour, etc). Therefore, it would be easier to access the previous row rather than subtract the time interval from each value since it always changes. My desired output would be the following:

  id       date status
1:  1 2014-06-23      1
2:  2 2014-06-21      1
3:  3 2014-06-21      1
4:  4 2014-06-20      1

请随时使用 base 代码或其他软件包(即 dplyr )来回答这个问题,我总是在寻求新的东西。谢谢您的时间!

Please feel free to use base code or other packages (i.e. dplyr) to answer this question, I am always up for something new. Thank you for your time!

推荐答案

请尝试以下操作:

dt[dt[status == 0, .I[1] - 1, by = id]$V1]
#   id       date status
#1:  1 2014-06-23      1
#2:  2 2014-06-21      1
#3:  3 2014-06-21      1
#4:  4 2014-06-20      1

这种方法(使用 .I 代替 .SD )也将更快。有关更多信息,请参见这篇文章

Incidentally, this method (using .I instead of .SD) will also be much faster. See this post for more on that.

这篇关于按组在第一次发生事件之前选择行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆