汇总轮班时间后合并两个每日时间序列 [英] merge two daily time series after summarising on shifted hours

查看:129
本文介绍了汇总轮班时间后合并两个每日时间序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个按小时时间戳记有日期时间变量的索引(例如太阳辐射)。我想要做的是对一年中每一天的测量值求和,并将其与另一个数据源(也就是平均室外温度)相匹配。



尽管,第二个数据源已经从上午8:00到第二天上午8:00汇总。我知道如何按标准日总结我的第一个变量,但我需要从8到8进行汇总,以匹配两个度量。



我的数据示例

  set.seed(1L)#创建可再现的数据
每小时= data.frame(datetime = seq(from = lubridate :: ymd_hm( 2017-01-01 01:00),
length.out = 168,by = hour),
value = rpois(168,10))
每天= data.frame(datetime = seq(from = as.Date( 2017-01-01),length.out = 31,by = day),
value = rnorm(31))


解决方案

扩展我的评论作为答案,值得注意的是,OP强调了单词从次日上午8:00到次日上午8:00汇总



未将24小时周期映射到日期



如果24小时周期不是 与午夜对齐,即 not 是否从00:00延长至24:00,但在一天中的某个时间开始和结束,所以不明确与该日期相关联



我们可以选择


  1. 日期为时段开始的时间,

  2. 该时段结束的日期,或

  3. 包含大部分小时的日期

仅说明差异:

 #时间戳:第二天上午9点,晚上10点,上午7点
x<-lubridate :: ymd_hm(c( 2017-09-12 09:00, 2017-09-12 22:00, 2017-09-13 07:00))
x




  [1] 2017-09-12 09:00:00 UTC 2017-09-12 22:00:00 UTC 2017-09- 13 07:00:00 UTC 




 #映射到该日期的时间戳记,该时间戳记通过向后移8小时
x + lubridate :: hours(-8L)




  [1] 2017-09-12 01:00:00 UTC 2017-09-12 14:00:00 UTC 2017-09-12 23:00:00 UTC 




 #映射到该日期的时间戳记,该时间戳记通过提前16小时来结束该时间
x + lubridate :: hours(16L)




  [1] 2017-09- 13 01:00:00 UTC 2017-09-13 14:00:00 UTC 2017-09-13 23:00:00 UTC 


由于没有其他信息,我们假设将每日数据映射到



汇总和合并



用于分组,汇总和合并<$使用c $ c> data.table :

  library(data.table)
#通过移位时间戳
setDT(hourly)[,。( sum.value = sum(value)),
by =。(date = as.Date(datetime + lubridate :: hours(-8L)))]




 日期总值
1:2016-12-31 68
2:2017-01-01 232
3:2017-01-02 222
4:2017-01-03 227
5:2017-01-04 228
6:2017-01-05 231
7:2017-01-06 260
8:2017-01-07 144


请注意,将创建用于分组和聚合的新 date by 参数中的即时操作(我偏爱 data.table 的原因之一)



现在,需要连接每日数据。通过 chaining 可以将其合并为一个语句:

  setDT(hourly)[,。(sum .value = sum(value)),
by =。(date = as.Date(datetime + lubridate :: hours(-8L)))] [
setDT(daily),on =。(日期=日期时间),不匹配= 0L]




 日期总值值
1:2017-01-01 232 -0.5080862
2:2017-01-02 222 0.5236206
3:2017-01-03 227 1.0177542
4:2017-01-04 228 -0.2511646
5:2017-01-05 231 -1.4299934
6:2017-01-06 260 1.7091210
7:2017-01- 07 144 1.4350696


参数 nomatch = 0L 表示我们要在此处进行内部联接


I have a measurement (for instance solar radiation) indexed with a datetime variable, at an hourly timestamp. What I want to do is to sum the measurement value for each day of the year, and match this to another source of data also at daily scale (let's say mean outdoor temperature).

Although, the second source of data is already agregated from 8:00am to 8:00am the next day. I know how to summarise my first variable by standard day, but I need to do it from 8 to 8 in order to match both measurements.

An example of my data

set.seed(1L) # to create reproducible data
hourly = data.frame(datetime = seq(from = lubridate::ymd_hm("2017-01-01 01:00"), 
                                   length.out = 168, by = "hour"),
                    value = rpois(168, 10))
daily = data.frame(datetime = seq(from=as.Date("2017-01-01"), length.out = 31, by="day"),
                   value=rnorm(31))

解决方案

Expanding my comment into an answer, it's worth to note that the OP has emphasized the words aggregated from 8:00am to 8:00am the next day.

Mapping not aligned 24 hour periods to dates

If a 24 hour period is not aligned with midnight, i.e., does not extend from 00:00 to 24:00 but starts and ends sometime during the day, it is ambiguous which date is associated with that period.

We can take either

  1. the date of the day on which the period starts,
  2. the date of the day on which the period ends, or
  3. the date of the day which contains the majority of hours of the period.

Just to illustrate the difference:

# timestamps: 9 am, 10pm, 7 am next day 
x <- lubridate::ymd_hm(c("2017-09-12 09:00", "2017-09-12 22:00", "2017-09-13 07:00"))
x

[1] "2017-09-12 09:00:00 UTC" "2017-09-12 22:00:00 UTC" "2017-09-13 07:00:00 UTC"

# map timestamps to date on which period starts by shifting back by 8 hours
x + lubridate::hours(-8L)

[1] "2017-09-12 01:00:00 UTC" "2017-09-12 14:00:00 UTC" "2017-09-12 23:00:00 UTC"

# map timestamps to date on which period ends by advancing by 16 hours
x + lubridate::hours(16L)

[1] "2017-09-13 01:00:00 UTC" "2017-09-13 14:00:00 UTC" "2017-09-13 23:00:00 UTC"

As there are no other information, let's assume that the daily data were mapped onto the day on which the period start.

Aggregating and merging

For grouping, aggregating, and merging data.table is used:

library(data.table)
# aggregate data by shifted timestamp
setDT(hourly)[, .(sum.value = sum(value)), 
              by = .(date = as.Date(datetime + lubridate::hours(-8L)))]

         date sum.value
1: 2016-12-31        68
2: 2017-01-01       232
3: 2017-01-02       222
4: 2017-01-03       227
5: 2017-01-04       228
6: 2017-01-05       231
7: 2017-01-06       260
8: 2017-01-07       144

Note that the new date column which is used for grouping and aggregating is created on the fly in the by parameter (one of the reasons why I prefer data.table)

Now, the daily data need to be joined. By chaining this can be combined in one statement:

setDT(hourly)[, .(sum.value = sum(value)), 
              by = .(date = as.Date(datetime + lubridate::hours(-8L)))][
                setDT(daily), on = .(date = datetime), nomatch = 0L]

         date sum.value      value
1: 2017-01-01       232 -0.5080862
2: 2017-01-02       222  0.5236206
3: 2017-01-03       227  1.0177542
4: 2017-01-04       228 -0.2511646
5: 2017-01-05       231 -1.4299934
6: 2017-01-06       260  1.7091210
7: 2017-01-07       144  1.4350696

The parameter nomatch = 0L indicate that we want an inner join here.

这篇关于汇总轮班时间后合并两个每日时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆