相对于时间取平均值 [英] Averaging the value with respect to time

查看:89
本文介绍了相对于时间取平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下带有日期时间和相应值的数据集。时间间隔是每10分钟。我需要以15分钟的间隔生成新行。

I have the below dataset with date-time and the corresponding value. The time interval is every 10 mins. I need to generate new rows with 15 mins interval.

例如,对于15:40,值是599,对于15:50,值是594,因此,在这两者之间需要生成一行,即15:45,平均599& 594,即596.5

For example, for 15:40 the value is 599 and for 15:50 the value is 594, so a new row needs to be generated between the two, i.e 15:45 with average of 599 & 594 which is 596.5

即,我需要生成平均值10和20表示16:15的值;和40& 50以获取16:45的值。 00、30的值保持不变

I.e, I need to generate an average between 10 & 20 to get the value for say 16:15; and 40 & 50 to get the value for 16:45. The value for 00, 30 remains the same

Date...Time       RA.CO2
6/15/2017 15:40   599
6/15/2017 15:50   594
6/15/2017 16:00   606
6/15/2017 16:10   594
6/15/2017 16:20   594
6/15/2017 16:30   594
6/15/2017 16:40   594
6/15/2017 16:50   594
6/16/2017 0:00    496.25
6/16/2017 0:10    500
6/16/2017 0:20    496.25
6/16/2017 0:30    496.25
6/16/2017 0:40    600
6/16/2017 0:50    650
6/16/2017 1:00    700 

str(df)
'data.frame':   6092 obs. of  2 variables:
 $ Date...Time: chr  "6/15/2017 15:40" "6/15/2017 15:50" "6/15/2017 16:00" 
 "6/15/2017 16:10" ...
$ RA.CO2     : num  599 594 606 594 594 594 594 594 594 594 ...

输出

 Date...Time          RA.CO2
 6/15/2017 15:45      596.5
 6/15/2017 16:00      606
 6/15/2017 16:15      594
 6/15/2017 16:30      594
 6/15/2017 16:45      594
 6/16/2017 0:00       496.25
 6/16/2017 0:15       498.125
 6/16/2017 0:30       496.25
 6/16/2017 0:45       625
 6/16/2017 1:00       700


推荐答案

我们可以使用 tidyr 扩展数据框架和 imputeTS 通过线性插值来插补缺失值。

We can use tidyr to expand the data frame and imputeTS to impute the missing values by linear interpolation.

library(dplyr)
library(tidyr)
library(lubridate)
library(imputeTS)

dt2 <- dt %>%
  mutate(Date...Time = mdy_hm(Date...Time)) %>%
  mutate(Date = as.Date(Date...Time)) %>%
  group_by(Date) %>%
  complete(Date...Time = seq(min(Date...Time), max(Date...Time), by = "5 min")) %>%
  mutate(RA.CO2 = na.interpolation(RA.CO2)) %>%
  ungroup() %>%
  select(Date...Time, RA.CO2)
dt2
# A tibble: 22 x 2
           Date...Time RA.CO2
                <dttm>  <dbl>
 1 2017-06-15 15:40:00  599.0
 2 2017-06-15 15:45:00  596.5
 3 2017-06-15 15:50:00  594.0
 4 2017-06-15 15:55:00  600.0
 5 2017-06-15 16:00:00  606.0
 6 2017-06-15 16:05:00  600.0
 7 2017-06-15 16:10:00  594.0
 8 2017-06-15 16:15:00  594.0
 9 2017-06-15 16:20:00  594.0
10 2017-06-15 16:25:00  594.0
# ... with 12 more rows

我的输出与您想要的输出不完全相同。这是因为:

My output is not entirely the same as your desired output. This is because:


  1. 目前尚不清楚如何获取 6/16/2017 0中的值: 10

  2. 为什么有时间隔是5分钟,但有时是10分钟?

  3. 为什么要包含最后三行?同样也不清楚填充最后三行值的规则。

  1. It is not clear how do you get the values in 6/16/2017 0:10.
  2. Why sometimes the interval is 5 minutes, but sometimes it is 10 minutes?
  3. Why do you include the last three rows? It is also not clear the rules to fill the values of the last three rows.

不过,我认为我的解决方案为您提供了可能实现此任务的方法。您可能需要自己调整代码以适应那些不清楚的规则。

Nevertheless, I think my solution provides you a possible way to achieve this task. You may need to adjust the code by yourself to fit those unclear rules.

dt <- read.table(text = "Date...Time       RA.CO2
'6/15/2017 15:40'   599
                 '6/15/2017 15:50'   594
                 '6/15/2017 16:00'   606
                 '6/15/2017 16:10'   594
                 '6/15/2017 16:20'   594
                 '6/15/2017 16:30'   594
                 '6/15/2017 16:40'   594
                 '6/15/2017 16:50'   594
                 '6/16/2017 0:00'    496.25
                 '6/16/2017 0:10'    496.25
                 '6/16/2017 0:20'    496.25
                 '6/16/2017 0:30'    496.25",
                 header = TRUE, stringsAsFactors = FALSE)

这篇关于相对于时间取平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆