从多年期计算小时平均数 [英] Calculating hourly averages from a multi-year timeseries

查看:187
本文介绍了从多年期计算小时平均数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集充满了每小时平均风速多年。我想创建一个平均年,其中​​每个小时计算多个时间那个小时的平均风速。如何无需循环遍历数据集呢?
理想情况下,我只想循环浏览一次数据,为每一行提取正确的月份,日期和时间,并将该行的风速添加到数据帧中每个月的聚合中的右侧行,一天和一个小时收集。可以在不提取月,日,小时的情况下进行此操作,然后循环遍历完整的平均年数据框来查找正确的行?



示例数据:

  data.multipleyears<  -  data.frame(
DATETIME = c(2001-01- 01 01:00:00,2001-05-03 09:00:00,2007-01-01 01:00:00,2008-02-29 12:00:00),
Windspeed = c(10,5,8,3)

想在这样的数据框中聚合:

  average.year<  -  data.frame(
DATETIME = c(01-01 00:00:00,01-01 01:00:00,...,12-31 23:00:00)
Aggregate.Windspeed =(100, 80,...)

从那里,我可以继续计算平均值等等。我可能忽略了一些命令,但是对于这样的东西(伪代码)将是正确的语法:

  for(i in 1:nrow(data.multipleyears){
average.year $ Aggregate.Windspeed [
where average.year $ DATETIME(month,day,hour)== data.multipleyears $ DATETIME [i](month,day,hour)]< - average.year $ Aggregate.Windspeed + data.multipleyears $ Windspeed [i ]
}

或类似的东西。帮助是赞赏!

解决方案

我预测,ddply和plyr包将成为你最好的朋友:)。我创建了一个30年数据集,每小时随机风速在1到10 ms之间:

  begin_date = as.POSIXlt(1990-01 -01,tz =GMT)
#30年数据集
dat = data.frame(dt = begin_date +(0:(24 * 30 * 365))*(3600))
dat = within(dat,{
speed = runif(length(dt),1,10)
unique_day = strftime(dt,%d-%m)
})
> head(dat)
dt unique_day speed
1 1990-01-01 00:00:00 01-01 7.054124
2 1990-01-01 01:00:00 01-01 2.202591
3 1990-01-01 02:00:00 01-01 4.111633
4 1990-01-01 03:00:00 01-01 2.687808
5 1990-01-01 04:00 :00 01-01 8.643168
6 1990-01-01 05:00:00 01-01 5.499421

要计算这30年内每天的平均天气(30年平均值,这个术语很多用于气象):

  library(plyr)
res = ddply(dat,。(unique_day),
总结,mean_speed =平均(速度),.progress =文本)
>头(res)
unique_day mean_speed
1 01-01 5.314061
2 01-02 5.677753
3 01-03 5.395054
4 01-04 5.236488
5 01-05 5.436896
6 01-06 5.544966

这只需要几秒钟我谦虚的两个核心的AMD,所以我怀疑只要一次通过数据就不需要了。这些 ddply 中的多个不同的聚合(月,季节等)可以单独完成。


I have a dataset filled with the average windspeed per hour for multiple years. I would like to create an 'average year', in which for each hour the average windspeed for that hour over multiple years is calculated. How can I do this without looping endlessly through the dataset? Ideally, I would like to just loop through the data once, extracting for each row the right month, day, and hour, and adding the windspeed from that row to the right row in a dataframe where the aggregates for each month, day, and hour are gathered. Is it possible to do this without extracting the month, day, and hour, and then looping over the complete average-year data.frame to find the right row?

Some example data:

data.multipleyears <- data.frame(
 DATETIME = c("2001-01-01 01:00:00", "2001-05-03 09:00:00", "2007-01-01 01:00:00", "2008-02-29 12:00:00"),
 Windspeed = c(10, 5, 8, 3)
)

Which I would like to aggregate in a dataframe like this:

average.year <- data.frame(
 DATETIME = c("01-01 00:00:00", "01-01 01:00:00", ..., "12-31 23:00:00")
 Aggregate.Windspeed = (100, 80, ...)
)

From there, I can go on calculating the averages, etc. I have probably overlooked some command, but what would be the right syntax for something like this (in pseudocode):

 for(i in 1:nrow(data.multipleyears) {
  average.year$Aggregate.Windspeed[
   where average.year$DATETIME(month, day, hour) == data.multipleyears$DATETIME[i](month, day, hour)]  <- average.year$Aggregate.Windspeed + data.multipleyears$Windspeed[i]
 }

Or something like that. Help is appreciated!

解决方案

I predict that ddply and the plyr package are going to be your best friend :). I created a 30 year dataset with hourly random windspeeds between 1 and 10 ms:

begin_date = as.POSIXlt("1990-01-01", tz = "GMT")
# 30 year dataset
dat = data.frame(dt = begin_date + (0:(24*30*365)) * (3600))
dat = within(dat, {
  speed = runif(length(dt), 1, 10)
  unique_day = strftime(dt, "%d-%m")
})
> head(dat)
                   dt unique_day    speed
1 1990-01-01 00:00:00      01-01 7.054124
2 1990-01-01 01:00:00      01-01 2.202591
3 1990-01-01 02:00:00      01-01 4.111633
4 1990-01-01 03:00:00      01-01 2.687808
5 1990-01-01 04:00:00      01-01 8.643168
6 1990-01-01 05:00:00      01-01 5.499421

To calculate the daily normalen (30 year average, this term is much used in meteorology) over this 30 year period:

library(plyr)
res = ddply(dat, .(unique_day), 
            summarise, mean_speed = mean(speed), .progress = "text")
> head(res)
  unique_day mean_speed
1      01-01   5.314061
2      01-02   5.677753
3      01-03   5.395054
4      01-04   5.236488
5      01-05   5.436896
6      01-06   5.544966

This takes just a few seconds on my humble two core AMD, so I suspect just going once through the data is not needed. Multiple of these ddply calls for different aggregations (month, season etc) can be done separately.

这篇关于从多年期计算小时平均数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆