R:如何在组级重采样日内数据? [英] R: how to resample intraday data at the group level?
问题描述
请考虑以下数据帧
time <-c('2016-04-13 23:07:45','2016-04-13 23:07:50','2016-04-13 23:08:45','2016-04-13 23:08:45'
,'2016-04-13 23:08:45','2016-04-13 23:07:50','2016-04-13 23:07:51')
group <-c('A','A','A','B','B','B','B')
value<- c(5,10,2,2,NA,1,4)
df<-data.frame(time,group,value)
> df
time group value
1 2016-04-13 23:07:45 A 5
2 2016-04-13 23:07:50 A 10
3 2016-04-13 23:08:45 A 2
4 2016-04-13 23:08:45 B 2
5 2016-04-13 23:08:45 B NA
6 2016-04-13 23:07:50 B 1
7 2016-04-13 23:07:51 B 4
我想以 5秒级别 -
$> c $ c>,并为每个
time-interval
计算 value
的 sum group value
。
I would like to resample this dataframe at the 5 seconds level
- group level
, and compute the sum of value
for each time-interval
- group value
.
左侧关闭间隔,右侧打开。例如,第一行输出应为
The interval should be closed on the left and open on the right. For instance, the first line of output should be
2016-04-13 23:07:45 A 5
因为第一个5秒间隔是 [2016-04-13 23:07:45,2016-04-13 23:07:50 [
2016-04-13 23:07:45 A 5
because the first 5-sec interval is [2016-04-13 23:07:45, 2016-04-13 23:07:50[
我如何在 dplyr
或 data.table
中这样做?我需要为时间戳导入 lubridate
吗?
How can I do that in either dplyr
or data.table
? Do I need to import lubridate
for the timestamps?
推荐答案
这个:
library(dplyr)
Group5 <- function(myDf) {
myDf$time <- ymd_hms(myDf$time)
myDf$timeGroup <- floor_date(myDf$time, unit = "5 seconds")
summarise(myDf %>% group_by(group, timeGroup), sum(value, na.rm = TRUE))
}
Group5(df)
Source: local data frame [5 x 3]
Groups: group [?]
group timeGroup `sum(value, na.rm = TRUE)`
<fctr> <dttm> <dbl>
1 A 2016-04-13 23:07:45 5
2 A 2016-04-13 23:07:50 10
3 A 2016-04-13 23:08:45 2
4 B 2016-04-13 23:07:50 5
5 B 2016-04-13 23:08:45 2
它利用 floor_date
和 ymd_hms
从<$
这里有一个更奇特的例子:
Here is a more exotic example:
set.seed(500)
time <- ymd_hms('2016-04-13 23:07:45') + sample(-10^3:10^3, 10^5, replace=TRUE)
group <- rep(LETTERS[1:20], each = 5000)
value <- rep(NA, 10^5)
value[sample(10^5, 95000)] <- sample(100, 95000, replace=TRUE)
df2 <- data.frame(time,group,value)
head(df2)
time group value
1 2016-04-13 23:18:53 A 53
2 2016-04-13 23:15:15 A NA
3 2016-04-13 23:23:36 A 40
4 2016-04-13 23:06:40 A 23
5 2016-04-13 23:18:10 A 74
6 2016-04-13 22:57:56 A 65
调用它我们有:
Group5(df2)
Source: local data frame [8,020 x 3]
Groups: group [?]
group timeGroup `sum(value, na.rm = TRUE)`
<fctr> <dttm> <int>
1 A 2016-04-13 22:51:05 379
2 A 2016-04-13 22:51:10 646
3 A 2016-04-13 22:51:15 391
4 A 2016-04-13 22:51:20 1118
5 A 2016-04-13 22:51:25 745
6 A 2016-04-13 22:51:30 546
7 A 2016-04-13 22:51:35 884
8 A 2016-04-13 22:51:40 711
9 A 2016-04-13 22:51:45 526
10 A 2016-04-13 22:51:50 484
# ... with 8,010 more rows
这篇关于R:如何在组级重采样日内数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!