按时间阈值过滤行 [英] Filter rows by a time threshold

查看:62
本文介绍了按时间阈值过滤行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样组织的数据集:

I have a dataset organized this way:

ID   Species       DateTime
P1   A             2015-03-16 18:42:00
P2   A             2015-03-16 19:34:00
P3   A             2015-03-16 19:58:00
P4   A             2015-03-16 21:02:00
P5   B             2015-03-16 21:18:00
P6   A             2015-03-16 21:19:00
P7   A             2015-03-16 21:33:00
P8   B             2015-03-16 21:35:00
P9   B             2015-03-16 23:43:00

在此数据集中,我想为每个物种选择独立的图片(即彼此间隔1h的图片)。

I want to select independent pictures for each species (that is, pictures separated from each other by 1h), in this dataset with R.

在此示例中,对于物种A,我只想保留P1,P3和P4。不会考虑P2,因为它属于以P1开头的1小时内。 P3被认为是因为其DateTime(19h58)在19h42之后。而现在,接下来的1h时段将持续到20h58。对于物种B,只有P5和P9。

In this example, for species A, I would only want to keep P1, P3 and P4. P2 wouldn't be considered because it falls within the 1h period that started with P1. P3 is considered since its DateTime (19h58) falls after 19h42. And now, the next 1h period would last until 20h58. For species B, only P5 and P9.

因此,在此过滤器之后,我的数据集将如下所示:

Therefore, after this filter, my dataset would look like this:

ID   Species       DateTime
P1   A             2015-03-16 18:42:00
P3   A             2015-03-16 19:58:00
P4   A             2015-03-16 21:02:00
P5   B             2015-03-16 21:18:00
P9   B             2015-03-16 23:43:00

有人知道如何在R中执行此操作吗?

Does someone know how to perform this in R?

推荐答案

也许有一种更优雅的方法,但这可行:

There may be a more elegant way to do it, but this works:

library(dplyr)

isHourApart <- function(dt) {
    min <- 0
    keeps <- c()
    for (d in dt) {
        if (d >= min + 60 * 60) {
            min <- d
            keeps <- c(keeps, TRUE)
        } else {
            keeps <- c(keeps, FALSE)
        }
    }
    keeps
}


df %>% 
    group_by(Species) %>% 
    filter(isHourApart(DateTime))

> df
# A tibble: 5 x 3
# Groups:   Species [2]
  ID    Species DateTime           
  <chr> <fct>   <dttm>             
1 P1    A       2015-03-16 18:42:00
2 P3    A       2015-03-16 19:58:00
3 P4    A       2015-03-16 21:02:00
4 P5    B       2015-03-16 21:18:00
5 P9    B       2015-03-16 23:43:00

请注意,DateTime列属于POSIXct类。

Note that the DateTime column is of class POSIXct.

这篇关于按时间阈值过滤行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆