R：如何在组级重采样日内数据？ [英] R: how to resample intraday data at the group level?

查看：279 发布时间：2017/3/12 12:32:15 r data.table dplyr lubridate

本文介绍了R：如何在组级重采样日内数据？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

请考虑以下数据帧

time <-c('2016-04-13 23:07:45','2016-04-13 23:07:50','2016-04-13 23:08:45','2016-04-13 23:08:45'
         ,'2016-04-13 23:08:45','2016-04-13 23:07:50','2016-04-13 23:07:51')
group <-c('A','A','A','B','B','B','B')
value<- c(5,10,2,2,NA,1,4)
df<-data.frame(time,group,value)

> df
                 time group value
1 2016-04-13 23:07:45     A     5
2 2016-04-13 23:07:50     A    10
3 2016-04-13 23:08:45     A     2
4 2016-04-13 23:08:45     B     2
5 2016-04-13 23:08:45     B    NA
6 2016-04-13 23:07:50     B     1
7 2016-04-13 23:07:51     B     4

我想以 5秒级别 - c $ c>，并为每个 time-interval 计算 value 的 sum group value 。


I would like to resample this dataframe at the 5 seconds level - group level, and compute the sum of value for each time-interval - group value.
 左侧关闭间隔，右侧打开。例如，第一行输出应为
The interval should be closed on the left and open on the right. For instance, the first line of output should be
  2016-04-13 23:07:45 A 5 因为第一个5秒间隔是 [2016-04-13 23:07:45，2016-04-13 23:07:50 [ 
2016-04-13 23:07:45     A     5 because the first 5-sec interval is [2016-04-13 23:07:45, 2016-04-13 23:07:50[
我如何在 dplyr 或 data.table 中这样做？我需要为时间戳导入 lubridate 吗？
How can I do that in either dplyr or data.table? Do I need to import lubridate for the timestamps?
推荐答案
这个：
library(dplyr)
Group5 <- function(myDf) {
    myDf$time <- ymd_hms(myDf$time)
    myDf$timeGroup <- floor_date(myDf$time, unit = "5 seconds")
    summarise(myDf %>% group_by(group, timeGroup), sum(value, na.rm = TRUE))
}

Group5(df)
Source: local data frame [5 x 3]
Groups: group [?]

   group           timeGroup `sum(value, na.rm = TRUE)`
  <fctr>              <dttm>                      <dbl>
1      A 2016-04-13 23:07:45                          5
2      A 2016-04-13 23:07:50                         10
3      A 2016-04-13 23:08:45                          2
4      B 2016-04-13 23:07:50                          5
5      B 2016-04-13 23:08:45                          2

它利用 floor_date 和 ymd_hms 从<$ 
这里有一个更奇特的例子：
Here is a more exotic example:
set.seed(500)
time <- ymd_hms('2016-04-13 23:07:45') + sample(-10^3:10^3, 10^5, replace=TRUE)
group <- rep(LETTERS[1:20], each = 5000)
value <- rep(NA, 10^5)
value[sample(10^5, 95000)] <- sample(100, 95000, replace=TRUE)
df2 <- data.frame(time,group,value)

head(df2)
                 time group value
1 2016-04-13 23:18:53     A    53
2 2016-04-13 23:15:15     A    NA
3 2016-04-13 23:23:36     A    40
4 2016-04-13 23:06:40     A    23
5 2016-04-13 23:18:10     A    74
6 2016-04-13 22:57:56     A    65

调用它我们有：
Group5(df2)
Source: local data frame [8,020 x 3]
Groups: group [?]

    group           timeGroup `sum(value, na.rm = TRUE)`
   <fctr>              <dttm>                      <int>
1       A 2016-04-13 22:51:05                        379
2       A 2016-04-13 22:51:10                        646
3       A 2016-04-13 22:51:15                        391
4       A 2016-04-13 22:51:20                       1118
5       A 2016-04-13 22:51:25                        745
6       A 2016-04-13 22:51:30                        546
7       A 2016-04-13 22:51:35                        884
8       A 2016-04-13 22:51:40                        711
9       A 2016-04-13 22:51:45                        526
10      A 2016-04-13 22:51:50                        484
# ... with 8,010 more rows


                        这篇关于R：如何在组级重采样日内数据？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

R：如何在组级重采样日内数据？ [英] R: how to resample intraday data at the group level?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R：如何在组级重采样日内数据？ [英] R: how to resample intraday data at the group level?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭