如何按小时数平均一个​​时间段? [英] How to averaging over a time period by hours?

查看:89
本文介绍了如何按小时数平均一个​​时间段?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是R的新手,并且体验了我的第一个困难.我有一个大约10000 obs的数据集.捕获事件发生的365天之内仅在每个月的前14天标记出这种情况.我想通过对相应月份的以前出现次数(按小时)进行平均来补充另外的16天.

Im new to R and expirience my first difficulties. I have a data set of ca.10000 obs. of 365 days where I capture occurences of an event. This occurrences are marked out only for the first 14 days of each month. I would like to complement the additional 16 days by averaging over the previous occurrences of the corresponding month(by hour).

结构如下:

                    day           hours      occurrence
                    2000-01-01     1          5
                    2000-01-01     2          6
                    2000-01-01     3          7
                    ...            ...        ...
                    2000-01-01     23         3
                    2000-01-01     24         2
                    ...            ...        ...
                    2000-01-02     1          4
                    2000-01-02     2          2
                    2000-01-02     3          5
                    ...            ...        ...
                    2000-01-02     23         2
                    2000-01-02     24         1
                    ...
                    ...
                    2000-01-15     1          average of the previous 1 hours((5+4+n)/2*k))
                    2000-01-15     2          average of the previous 2 hours ((6+2+n)/2*k))
                    2000-01-15     3          average of the previous 3 hours((7+5+n)/2*k))
                    ...            ...         ...
                    2000-01-15     23         average of the previous 23 hours
                    2000-01-15     24         average of the previous 24 hours
                    ...            ...         ...
                    ...            ...         ...
                    2000-01-30
                    2000-01-30
                    2000-01-30
                    2000-01-30
                    ...            ...         ...
                    ...            ...         ...
                    2000-02-01
                    2000-02-01
                    2000-02-01
                    2000-02-01
                    ...            ...         ...
                    ...
                    ...            ...         ...
                    2000-12-24

我尝试了

               aggregate( occurences ~ hours, mean) 

但是结果没有意义,我尝试了

but the results were pointless and I tried

               tapply( X = occurences, INDEX = list(hours), FUN = Mean )

不幸的是,两个都没有像我想象的那样工作.我认为有必要在功能中包含相应的月份.但是我的能力似乎有限.

Unfortunately both didnt work as I imagined. I think its necessary to include the corresponding month into the function. However my means seems to be limited.

推荐答案

您可以尝试一下.请注意,为了使示例更小,我仅选择每月1-4天和0-1小时的数据.第一天每个月的第2天都有发生的数据,第2天和第2天3个缺少发生的数据.

You may try this. Please note that in order to make the example smaller, I select data only for day 1-4 and hour 0-1 each month. Day 1 & 2 in each month have data on occurrence, and day 2 & 3 are missing data for occurrence.

library(dplyr)

# create dummy data
set.seed(123) # for reproducibility of sample

d1 <- data.frame(time = seq(from = as.POSIXct("2000-01-01"), 
                            to = as.POSIXct("2000-02-28"),
                            by = "hour"))
d1 <- d1 %>%
  mutate(hour = as.integer(format(time, "%H")),
         day = as.integer(format(time, "%d")), # <~~ only needed to generate sample data
         month = as.integer(format(time, "%m")),
         occurence = sample(1:10, length(time), replace = TRUE),
         occurence = ifelse(day %in% 1:2, occurence, NA)) %>%  # <~~~ data only for day 1-2
  filter(hour %in% 0:1 & day %in% 1:4) %>%  # <~~~ smaller example: select hour 0-1, day 1-4
  select(-day)

# calculate mean occurrence per month and hour
d2 <- d1 %>%
  group_by(month, hour) %>%
  summarise(mean_occ = round(mean(occurence, na.rm = TRUE), 1))
d2
#   month hour mean_occ
# 1     1    0      5.0
# 2     1    1      8.0
# 3     2    0      5.5
# 4     2    1      6.5


# replace missing occurrence with mean_occ
d3 <- d1 %>%
  left_join(d2, by = c("hour", "month")) %>%
  mutate(occurence2 = ifelse(is.na(occurence), mean_occ, occurence)) %>%
  select(-month, -mean_occ)

d3
#    hour                time occurence occurence2
# 1     0 2000-01-01 00:00:00         3        3.0
# 2     1 2000-01-01 01:00:00         8        8.0
# 3     0 2000-01-02 00:00:00         7        7.0
# 4     1 2000-01-02 01:00:00         8        8.0
# 5     0 2000-01-03 00:00:00        NA        5.0
# 6     1 2000-01-03 01:00:00        NA        8.0
# 7     0 2000-01-04 00:00:00        NA        5.0
# 8     1 2000-01-04 01:00:00        NA        8.0
# 9     0 2000-02-01 00:00:00         4        4.0
# 10    1 2000-02-01 01:00:00         6        6.0
# 11    0 2000-02-02 00:00:00         7        7.0
# 12    1 2000-02-02 01:00:00         7        7.0
# 13    0 2000-02-03 00:00:00        NA        5.5
# 14    1 2000-02-03 01:00:00        NA        6.5
# 15    0 2000-02-04 00:00:00        NA        5.5
# 16    1 2000-02-04 01:00:00        NA        6.5

这篇关于如何按小时数平均一个​​时间段?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆