从 R 中的每分钟数据创建 15 分钟的时间间隔? [英] Create a time interval of 15 minutes from minutely data in R?

查看:22
本文介绍了从 R 中的每分钟数据创建 15 分钟的时间间隔?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些数据格式如下:

时间计数00:00 1700:01 6200:02 41

所以我有从 00:00 到 23:59 的时间,并且每分钟有一个计数器.我想以 15 分钟的间隔对数据进行分组,以便:

时间计数00:00-00:15 14800:16-00:30 284

我曾尝试手动完成,但这很累人,所以我确信必须有一个函数或某事可以轻松完成,但我还没有想出如何去做.

我真的很感激一些帮助!!

非常感谢!

解决方案

对于 POSIXct 格式的数据,可以使用 cut 函数创建 15 分钟的分组,然后按这些分组聚合.下面的代码显示了如何在 base Rdplyrdata.table 包中执行此操作.

首先,创建一些假数据:

set.seed(4984)dat = data.frame(time=seq(as.POSIXct("2016-05-01"), as.POSIXct("2016-05-01") + 60*99, by=60),计数=样本(1:50, 100, 替换=真))

基础 R

cut 将数据分成 15 分钟的组:

dat$by15 = cut(dat$time,breaks="15 min")

<块引用>

 时间计数 151 2016-05-01 00:00:00 22 2016-05-01 00:00:002 2016-05-01 00:01:00 11 2016-05-01 00:00:003 2016-05-01 00:02:00 31 2016-05-01 00:00:00...98 2016-05-01 01:37:00 20 2016-05-01 01:30:0099 2016-05-01 01:38:00 29 2016-05-01 01:30:00100 2016-05-01 01:39:00 37 2016-05-01 01:30:00

现在通过新的分组列aggregate,使用sum作为聚合函数:

dat.summary =aggregate(count ~ by15, FUN=sum, data=dat)

<块引用>

 by15 count1 2016-05-01 00:00:00 3122 2016-05-01 00:15:00 3953 2016-05-01 00:30:00 3414 2016-05-01 00:45:00 3185 2016-05-01 01:00:00 3496 2016-05-01 01:15:00 3977 2016-05-01 01:30:00 341

dplyr

库(dplyr)dat.summary = dat %>% group_by(by15=cut(time, "15 min")) %>%总结(计数=总和(计数))

数据表

library(data.table)dat.summary = setDT(dat)[, list(count=sum(count)), by=cut(time, "15 min")]

更新:为了回答评论,对于这种情况,每个分组间隔的终点是 as.POSIXct(as.character(dat$by15)) + 60*15 - 1.换句话说,分组间隔的终点是从间隔开始的 15 分钟减去一秒.我们添加 60*15 - 1 因为 POSIXct 是以秒计的.as.POSIXct(as.character(...)) 是因为 cut 返回一个因子,这只是将其转换回日期时间,以便我们可以进行数学运算在上面.

如果你想要下一个间隔前最近的一分钟(而不是最近的秒)的终点,你可以as.POSIXct(as.character(dat$by15)) + 60*14.

如果您不知道休息间隔,例如,因为您选择了休息次数并让 R 选择间隔,您可以通过执行 max(unique(diff(as.POSIXct(as.character(dat$by15))))) - 1.

I have some data which is formatted in the following way:

time     count 
00:00    17
00:01    62
00:02    41

So I have from 00:00 to 23:59hours and with a counter per minute. I'd like to group the data in intervals of 15 minutes such that:

time           count
00:00-00:15    148   
00:16-00:30    284

I have tried to do it manually but this is exhausting so I am sure there has to be a function or sth to do it easily but I haven't figured out yet how to do it.

I'd really appreciate some help!!

Thank you very much!

解决方案

For data that's in POSIXct format, you can use the cut function to create 15-minute groupings, and then aggregate by those groups. The code below shows how to do this in base R and with the dplyr and data.table packages.

First, create some fake data:

set.seed(4984)
dat = data.frame(time=seq(as.POSIXct("2016-05-01"), as.POSIXct("2016-05-01") + 60*99, by=60),
                 count=sample(1:50, 100, replace=TRUE))

Base R

cut the data into 15 minute groups:

dat$by15 = cut(dat$time, breaks="15 min")

                   time count                by15
1   2016-05-01 00:00:00    22 2016-05-01 00:00:00
2   2016-05-01 00:01:00    11 2016-05-01 00:00:00
3   2016-05-01 00:02:00    31 2016-05-01 00:00:00
...
98  2016-05-01 01:37:00    20 2016-05-01 01:30:00
99  2016-05-01 01:38:00    29 2016-05-01 01:30:00
100 2016-05-01 01:39:00    37 2016-05-01 01:30:00

Now aggregate by the new grouping column, using sum as the aggregation function:

dat.summary = aggregate(count ~ by15, FUN=sum, data=dat)

                 by15 count
1 2016-05-01 00:00:00   312
2 2016-05-01 00:15:00   395
3 2016-05-01 00:30:00   341
4 2016-05-01 00:45:00   318
5 2016-05-01 01:00:00   349
6 2016-05-01 01:15:00   397
7 2016-05-01 01:30:00   341

dplyr

library(dplyr)

dat.summary = dat %>% group_by(by15=cut(time, "15 min")) %>%
  summarise(count=sum(count))

data.table

library(data.table)

dat.summary = setDT(dat)[ , list(count=sum(count)), by=cut(time, "15 min")]

UPDATE: To answer the comment, for this case the end point of each grouping interval is as.POSIXct(as.character(dat$by15)) + 60*15 - 1. In other words, the endpoint of the grouping interval is 15 minutes minus one second from the start of the interval. We add 60*15 - 1 because POSIXct is denominated in seconds. The as.POSIXct(as.character(...)) is because cut returns a factor and this just converts it back to date-time so that we can do math on it.

If you want the end point to the nearest minute before the next interval (instead of the nearest second), you could to as.POSIXct(as.character(dat$by15)) + 60*14.

If you don't know the break interval, for example, because you chose the number of breaks and let R pick the interval, you could find the number of seconds to add by doing max(unique(diff(as.POSIXct(as.character(dat$by15))))) - 1.

这篇关于从 R 中的每分钟数据创建 15 分钟的时间间隔?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆