R:基于在观察窗口的一定分钟数内的时间,设置数据帧 [英] R: Subset a data frame based on times that are within a certain number of minutes of an observation window

查看:112
本文介绍了R:基于在观察窗口的一定分钟数内的时间,设置数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个包含开始和结束时间列,测量列和测量时间列的数据框,如下所示:

 开始结束时间
9:01:00 9:02:00 30.6 2013-03-25 9:05:00
9:01:00 9:02:00 30.8 2013- 03-25 9:15:00
9:46:00 9:46:00 28.2 2013-03-25 9:43:00
9:46:00 9:46:00 28.9 2013- 03-25 9:53:00
10:54:00 10:59:00 13.4 2013-03-25 10:56:00
10:54:00 10:59:00 13.8 2013- 03-25 11:56:00

如何将这个数据框的子集包含在其中时间列在开始和结束时间之前或开始时间之前十分钟以及结束时间后十分钟。我任意选择十分钟,并想知道如何在开始和结束时间之前和之后的任何时间内执行此操作。



生成的数据框将如下所示:

 开始结束时间
9:01:00 9:02:00 30.6 2013- 03-25 9:05:00
9:46:00 9:46:00 28.2 2013-03-25 9:43:00
9:46:00 9:46:00 28.9 2013- 03-25 9:53:00
10:54:00 10:59:00 13.4 2013-03-25 10:56:00

有没有办法做,除了从开始/结束列条目减去/添加x分钟数,然后基于时间列是否在这些扩展的窗口?



目前,我已将时间列转换为POSIXlt格式。不幸的是,这给了今天日期到开始和结束列的时间。



这是第一个数据框的dput:

 结构(list(start = structure(list(sec = c(0,0,0,0,0,
0),min = c(1L,1L,46L,46L,54L,54L),小时= (9L,9L,9L,
9L,10L,10L),mday = c(7L,7L,7L,7L,7L,7L),mon = c(7L,
7L,7L,7L ,7L,3L,3L,3L,3L,3L),yday = c(218L,113L,113L,113L,113L,113L
),wday = 218L,218L,
218L,218L,218L),isdst = c(1L,1L,1L,1L,1L,1L)),Names = c(sec,
min hour,mday,mon,year,wday,yday,isdst
),class = c(POSIXlt,POSIXt)),end = (list(sec = c(0,
0,0,0,0,0),min = c(2L,2L,46L,46L,59L,59L),小时= c(9L,
9L,9L,9L,10L,10L),m天= c(7L,7L,7L,7L,7L,7L),mon = c(7L,
7L,7L,7L,7L,7L) year = c(113L,113L,113L,113L,113L,113L
),wday = c(3L,3L,3L,3L,3L,3L),yday = c(218L,218L,218L,
218L,218L,218L),isdst = c(1L,1L,1L,1L,1L,1L)),.Names = c(sec,
min,hour,mday,mon,year,wday,yday,isdst
) c(POSIXlt,POSIXt)),value = c(30.6,30.8,28.2,
28.9,13.4,13.8),time = structure(list(sec = c(0,0,0,0) ,
0,0),min = c(5L,15L,43L,53L,56L,56L),小时= c(9L,9L,
9L,9L,10L,11L),mday = c(22L,25L,25L,25L,25L,25L),mon = c(2L,
2L,2L,2L,2L,2L),年= c(113L,113L,113L,113L,113L, 113L
),wday = c(1L,1L,1L,1L,1L,1L),yday = c(83L,83L,83L,
83L,83L,83L),isdst = c ,1L,1L,1L,1L)),Names = c(sec,
min,hour,mday,mon,year,wday yday,isdst
),class = c(POSIXlt,POSIXt))),.Names = c(start,end,
value时间),row.names = c(NA,-6L),class =data.frame)

这里是第二个数据框的输入

  structure(list(start = structure(list(sec = c 0,0,0,0),min = c(1L,
46L,46L,54L),小时= c(9L,9L,9L,10 L,m天= c(7L,7L,7L,
7L),mon = c(7L,7L,7L,7L),年= c(113L,113L,113L,113L
) wday = c(3L,3L,3L,3L),yday = c(218L,218L,218L,218L),
isdst = c(1L,1L,1L,1L)), sec,min,hour,
mday,mon,year,wday,yday,isdst),class = c(POSIXlt $ bPOSIXt)),end = structure(list(sec = c(0,0,0,0),min = c(2L,
46L,46L,59L),hour = c(9L, 9L,9L,10L),mday = c(7L,7L,7L,
7L),mon = c(7L,7L,7L,7L),年= c(113L,113L,113L,113L
),wday = c(3L,3L,3L,3L),yday = c(218L,218L,218L,218L),
isdst = c(1L,1L,1L,1L)), = c(sec,min,hour,
mday,mon,year,wday,yday,isdst),class = c(POSIXlt ,
POSIXt)),value = c(30.6,28.2,28.9,13.4),time = structure(list(
sec = c(0,0,0,0))min = c(5L,43L,53L,56L),小时= c(9L,
9L,9L,10L),mday = c(25L,25L,25L,25L),mon = c(2L,2L,
2L,2L),年= c(113L,113L,113L,113L ),wday = c(1L,1L,
1L,1L),yday = c(83L,83L,83L,83L),isdst = c(1L,1L,
1L,1L) .Names = c(sec,min,hour,mday,mon,
year,wday,yday,isdst),class = POSIXlt,POSIXt
))),.Names = c(start,end,value,time),row.names = c(NA,
- 4L),class =data.frame)


解决方案

基于@ EliGurarie的答案:

  #dat < -  ....看到原始问题

将时间转换为 POSIX 表示,并执行数学:

  datestem<  -  as.character(as.Date(dat $ time))
dat $ start< - as.POSIXct粘贴(datestem,format(dat $ start,%H:%M:%S))
dat $ end< - as.POSIXct(粘贴(datestem,format(dat $ end,%H :%M:%S)))

dat [
with(
dat,
difftime(start,time,units =mins)> -10&
difftime(end,time,units =mins)< 10
),
]

或者,使用一些舍入和一些中间变量:

  min10 < -  10 /(60 * 24)
ds< - difftime开始,dat $ time,units =days)
ds< - dd - round(dd)
de< - difftime(dat $ end,dat $ time,units =days
de< - de - round(de)

dat [ds> -min10& de < min10,]


Let's say I have a data frame with start and end time columns, a measurement column and a time of measurement column, like so:

     start         end    value                time
   9:01:00     9:02:00     30.6  2013-03-25 9:05:00
   9:01:00     9:02:00     30.8  2013-03-25 9:15:00
   9:46:00     9:46:00     28.2  2013-03-25 9:43:00
   9:46:00     9:46:00     28.9  2013-03-25 9:53:00
  10:54:00    10:59:00     13.4 2013-03-25 10:56:00
  10:54:00    10:59:00     13.8 2013-03-25 11:56:00

How might one subset this data frame to include only rows for which the time column is within the start and end time or ten minutes before the start time and ten minutes after the end time. I'm choosing ten minutes arbitrarily, and would like to know how to do this for any amount of time before and after the start and end time.

The resulting data frame would be as follows:

     start         end    value                time
   9:01:00     9:02:00     30.6  2013-03-25 9:05:00
   9:46:00     9:46:00     28.2  2013-03-25 9:43:00
   9:46:00     9:46:00     28.9  2013-03-25 9:53:00
  10:54:00    10:59:00     13.4 2013-03-25 10:56:00

Is there a way to do this other than by subtracting/adding x number of minutes from the start/end column entries, and then subsetting based on whether or not the time column falls between these expanded windows?

Currently, I have convert my time columns into POSIXlt format. Unfortunately, this gives todays date to the times in the start and end column.

here is the dput for the first data frame:

structure(list(start = structure(list(sec = c(0, 0, 0, 0, 0, 
0), min = c(1L, 1L, 46L, 46L, 54L, 54L), hour = c(9L, 9L, 9L, 
9L, 10L, 10L), mday = c(7L, 7L, 7L, 7L, 7L, 7L), mon = c(7L, 
7L, 7L, 7L, 7L, 7L), year = c(113L, 113L, 113L, 113L, 113L, 113L
), wday = c(3L, 3L, 3L, 3L, 3L, 3L), yday = c(218L, 218L, 218L, 
218L, 218L, 218L), isdst = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("sec", 
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
), class = c("POSIXlt", "POSIXt")), end = structure(list(sec = c(0, 
0, 0, 0, 0, 0), min = c(2L, 2L, 46L, 46L, 59L, 59L), hour = c(9L, 
9L, 9L, 9L, 10L, 10L), mday = c(7L, 7L, 7L, 7L, 7L, 7L), mon = c(7L, 
7L, 7L, 7L, 7L, 7L), year = c(113L, 113L, 113L, 113L, 113L, 113L
), wday = c(3L, 3L, 3L, 3L, 3L, 3L), yday = c(218L, 218L, 218L, 
218L, 218L, 218L), isdst = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("sec", 
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
), class = c("POSIXlt", "POSIXt")), value = c(30.6, 30.8, 28.2, 
28.9, 13.4, 13.8), time = structure(list(sec = c(0, 0, 0, 0, 
0, 0), min = c(5L, 15L, 43L, 53L, 56L, 56L), hour = c(9L, 9L, 
9L, 9L, 10L, 11L), mday = c(25L, 25L, 25L, 25L, 25L, 25L), mon = c(2L, 
2L, 2L, 2L, 2L, 2L), year = c(113L, 113L, 113L, 113L, 113L, 113L
), wday = c(1L, 1L, 1L, 1L, 1L, 1L), yday = c(83L, 83L, 83L, 
83L, 83L, 83L), isdst = c(1L, 1L, 1L, 1L, 1L, 1L)), .Names = c("sec", 
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
), class = c("POSIXlt", "POSIXt"))), .Names = c("start", "end", 
"value", "time"), row.names = c(NA, -6L), class = "data.frame")

here is the dput for the second data frame

structure(list(start = structure(list(sec = c(0, 0, 0, 0), min = c(1L, 
46L, 46L, 54L), hour = c(9L, 9L, 9L, 10L), mday = c(7L, 7L, 7L, 
7L), mon = c(7L, 7L, 7L, 7L), year = c(113L, 113L, 113L, 113L
), wday = c(3L, 3L, 3L, 3L), yday = c(218L, 218L, 218L, 218L), 
    isdst = c(1L, 1L, 1L, 1L)), .Names = c("sec", "min", "hour", 
"mday", "mon", "year", "wday", "yday", "isdst"), class = c("POSIXlt", 
"POSIXt")), end = structure(list(sec = c(0, 0, 0, 0), min = c(2L, 
46L, 46L, 59L), hour = c(9L, 9L, 9L, 10L), mday = c(7L, 7L, 7L, 
7L), mon = c(7L, 7L, 7L, 7L), year = c(113L, 113L, 113L, 113L
), wday = c(3L, 3L, 3L, 3L), yday = c(218L, 218L, 218L, 218L), 
    isdst = c(1L, 1L, 1L, 1L)), .Names = c("sec", "min", "hour", 
"mday", "mon", "year", "wday", "yday", "isdst"), class = c("POSIXlt", 
"POSIXt")), value = c(30.6, 28.2, 28.9, 13.4), time = structure(list(
    sec = c(0, 0, 0, 0), min = c(5L, 43L, 53L, 56L), hour = c(9L, 
    9L, 9L, 10L), mday = c(25L, 25L, 25L, 25L), mon = c(2L, 2L, 
    2L, 2L), year = c(113L, 113L, 113L, 113L), wday = c(1L, 1L, 
    1L, 1L), yday = c(83L, 83L, 83L, 83L), isdst = c(1L, 1L, 
    1L, 1L)), .Names = c("sec", "min", "hour", "mday", "mon", 
"year", "wday", "yday", "isdst"), class = c("POSIXlt", "POSIXt"
))), .Names = c("start", "end", "value", "time"), row.names = c(NA, 
-4L), class = "data.frame")

解决方案

Building on @EliGurarie's answer:

#dat <- ....see original question

Convert the times to POSIX representations and do the maths:

datestem <- as.character(as.Date(dat$time))
dat$start <- as.POSIXct(paste(datestem,format(dat$start,"%H:%M:%S")))
dat$end <- as.POSIXct(paste(datestem,format(dat$end,"%H:%M:%S")))

dat[
     with(
      dat,
      difftime(start,time,units="mins") > -10 &
      difftime(end,time,units="mins") < 10
     ),
   ]

Alternatively, use a bit of rounding and some intermediate variables:

min10 <- 10/(60*24)
ds <- difftime(dat$start,dat$time,units="days")
ds <- dd - round(dd) 
de <- difftime(dat$end,dat$time,units="days")
de <- de - round(de) 

dat[ds > -min10 & de < min10,]

这篇关于R:基于在观察窗口的一定分钟数内的时间,设置数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆