从R中的精确数据创建15分钟的时间间隔? [英] Create a time interval of 15 minutes from minutely data in R?

查看:297
本文介绍了从R中的精确数据创建15分钟的时间间隔?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些数据,格式如下:

 时间计数
00:00 17
00:01 62
00:02 41

所以我从00: 00至23:59小时,每分钟一个柜台。我想按15分钟的时间间隔分组数据,例如:

 时间计数
00:00- 00:15 148
00:16-00:30 284

我试过它手动,但这是令人疲惫的,所以我相信必须有一个功能或做的事情,但我还没有想出如何去做。

我真的很感激一些帮助!



非常感谢!

解决方案

格式,您可以使用 cut 函数创建15分钟的分组,然后按这些组进行聚合。下面的代码展示了如何在 base R dplyr 数据中执行此操作。

 

set.seed(4984)
dat = data.frame(time = seq(as.POSIXct(2016-05-01),as.POSIXct(2016-05-01 )+ 60 * 99,by = 60),
count = sample(1:50,100,replace = TRUE))

Base R



cut 15分钟组:

  dat $ by15 = cut(dat $ time,breaks =15 min)




 时间点数15 
1 2016- 05-01 00:00:00 22 2016-05-01 00:00:00
2 2016-05-01 00:01:00 11 2016-05-01 00:00:00
3 2016-05-01 00:02:00 31 2016-05-01 00:00:00
...
98 2016-05-01 01:37:00 20 2016-05-01 01 :30:00
99 2016-05-01 01:38:00 29 2016-05-01 01:30:00
100 2016-05-01 01:39:00 37 2016-05-01 01:30:00


现在 aggregate 作为聚合函数:

  dat.summary = aggregate(count〜by15,FUN = sum,data = dat)




  by15计数
1 2016-05-01 00:00:00 312
2 2016 -05-01 00:15:00 395
3 2016-05-01 00:30:00 341
4 2016-05-01 00:45:00 318
5 2016-05 -01 01:00:00 349
6 2016-05-01 01:15:00 397
7 2016-05-01 01:30:00 341


dplyr

  library(dplyr)

dat.summary = dat%>%group_by(by15 = cut(time,15 min))%>%
summary(count = sum(count))

data.table

$ $ $ $ $ $ $ $ $ $ $ $ $
$ $ count = sum(count)),by = cut(ti我,15 min)]

更新:回覆评论,对于这种情况,每个分组间隔的终点是 as.POSIXct(as.character(dat $ by15))+ 60 * 15 - 1 。换句话说,分组间隔的终点是从间隔开始15分钟减1秒。我们添加60 * 15 - 1,因为 POSIXct 以秒为单位。 as.POSIXct(as.character(...))是因为 cut 返回一个因子,这只是转换它回到日期时间,以便我们可以做它的数学。

如果您希望在下一个时间间隔(而不是最近的时间间隔)之前的最后一分钟,您可以将 as.POSIXct( as.character(dat $ by15))+ 60 * 14



例如,如果您不知道休息时间间隔,例如因为您选择了休息次数并让R选择休息时间,则可以找到通过执行 max(unique(diff(as.POSIXct(as.character(dat $ by15))))) - 1 来添加。

I have some data which is formatted in the following way:

time     count 
00:00    17
00:01    62
00:02    41

So I have from 00:00 to 23:59hours and with a counter per minute. I'd like to group the data in intervals of 15 minutes such that:

time           count
00:00-00:15    148   
00:16-00:30    284

I have tried to do it manually but this is exhausting so I am sure there has to be a function or sth to do it easily but I haven't figured out yet how to do it.

I'd really appreciate some help!!

Thank you very much!

解决方案

For data that's in POSIXct format, you can use the cut function to create 15-minute groupings, and then aggregate by those groups. The code below shows how to do this in base R and with the dplyr and data.table packages.

First, create some fake data:

set.seed(4984)
dat = data.frame(time=seq(as.POSIXct("2016-05-01"), as.POSIXct("2016-05-01") + 60*99, by=60),
                 count=sample(1:50, 100, replace=TRUE))

Base R

cut the data into 15 minute groups:

dat$by15 = cut(dat$time, breaks="15 min")

                   time count                by15
1   2016-05-01 00:00:00    22 2016-05-01 00:00:00
2   2016-05-01 00:01:00    11 2016-05-01 00:00:00
3   2016-05-01 00:02:00    31 2016-05-01 00:00:00
...
98  2016-05-01 01:37:00    20 2016-05-01 01:30:00
99  2016-05-01 01:38:00    29 2016-05-01 01:30:00
100 2016-05-01 01:39:00    37 2016-05-01 01:30:00

Now aggregate by the new grouping column, using sum as the aggregation function:

dat.summary = aggregate(count ~ by15, FUN=sum, data=dat)

                 by15 count
1 2016-05-01 00:00:00   312
2 2016-05-01 00:15:00   395
3 2016-05-01 00:30:00   341
4 2016-05-01 00:45:00   318
5 2016-05-01 01:00:00   349
6 2016-05-01 01:15:00   397
7 2016-05-01 01:30:00   341

dplyr

library(dplyr)

dat.summary = dat %>% group_by(by15=cut(time, "15 min")) %>%
  summarise(count=sum(count))

data.table

library(data.table)

dat.summary = setDT(dat)[ , list(count=sum(count)), by=cut(time, "15 min")]

UPDATE: To answer the comment, for this case the end point of each grouping interval is as.POSIXct(as.character(dat$by15)) + 60*15 - 1. In other words, the endpoint of the grouping interval is 15 minutes minus one second from the start of the interval. We add 60*15 - 1 because POSIXct is denominated in seconds. The as.POSIXct(as.character(...)) is because cut returns a factor and this just converts it back to date-time so that we can do math on it.

If you want the end point to the nearest minute before the next interval (instead of the nearest second), you could to as.POSIXct(as.character(dat$by15)) + 60*14.

If you don't know the break interval, for example, because you chose the number of breaks and let R pick the interval, you could find the number of seconds to add by doing max(unique(diff(as.POSIXct(as.character(dat$by15))))) - 1.

这篇关于从R中的精确数据创建15分钟的时间间隔?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆