由数据框中的列定义的时间段的开始日期和结束日期 [英] Start and end dates of time periods defined by a column in a data frame

查看:36
本文介绍了由数据框中的列定义的时间段的开始日期和结束日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个按行组织的每小时数据的数据库,并且希望以某种方式重塑它,以便在数据符合特定条件时获得开始时间和结束时间

I have a database of hourly data organized in rows and would like to reshape it in such as way as to obtain the start and end times when the data are within a certain criteria

考虑以下案例,一列是连续的每小时时间,第二列是虚拟变量数据.

Consider the following case example, one column is the sequential hourly times, and in the second column is the dummy variable data.

Yrs=  data.frame(Date=seq(as.POSIXct("2019-02-04 01:00:00",tz="UTC"), as.POSIXct("2019-02-04 23:00:00",tz="UTC"), by="hour"))
Yrs$Var=c(1:12,1:11)

我想获取变量介于3到7之间的开始日期和结束日期.

I would like to obtain the start and end dates of the period in which the Variable was between say 3 and 7.

预期结果:

StartDate               EndDate
2019-02-04 03:00:00     2019-02-04 07:00:00
2019-02-04 15:00:00     2019-02-04 19:00:00

我想我可以创建一个新列来指示满足条件的行,但不知道如何获取那些连续期间的开始和结束

I figure I can create a new column indicating the rows where the criteria is met, but do not know how to get the start and end of those consecutive periods

Yrs$Period= ifelse(Yrs$Var >= 3 & Yrs$Var <=7, 1, 0)

我在这里

I found a reverse example to this problem here Given start date and end date, reshape/expand data for each day between (each day on a row) but I am struggling to figure this out. Any help will be greatly appreciated.

推荐答案

也许是这样的:

library(data.table)
setDT(Yrs)[, .(StartDate=Date[Var==3L], EndDate=Date[Var==7L]), 
    by=.(c(0L, cumsum(diff(Var) < 1L)))][, -1L]

输出:

             StartDate             EndDate
1: 2019-02-04 03:00:00 2019-02-04 07:00:00
2: 2019-02-04 15:00:00 2019-02-04 19:00:00

这篇关于由数据框中的列定义的时间段的开始日期和结束日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆