汇总平均值“%H%M”在“一周”内R中的垃圾箱 [英] Aggregating mean "%H%M" in "week" bins in R

查看:130
本文介绍了汇总平均值“%H%M”在“一周”内R中的垃圾箱的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直为此苦苦挣扎。我刚接触过 ts 数据和所有相关的R包。
我有一个df,其中包含多个变量,包括格林尼治标准时间%H%M中的一天中的时间和发生日期%Y /%m /%e的日期。我想将日期数据分类/汇总为周(即%W /%g),并计算该周中进行采样的平均一天中的时间。

I have been struggling with this for a while. I am new to working with ts data and all related R packages. I have a df with several variables including what 'time of day'in GMT "%H%M" and date "%Y/%m/%e" sampling occurred. I want to bin/aggregate my date data into "weeks" (i.e., %W/%g) and calculate the mean 'time of the day' when sampling occurred during that week.

我能够通过首先将df转换为Zoo对象,然后使用如下的aggregate.zoo命令来计算数字变量(例如权重)上的其他FUN:

I was able to calculate other FUN on numerical variables (e.g., weight) by first transforming my df into a zoo object and then using aggregate.zoo command as follow:

#calculate the sum weight captured every week 
x2c <- aggregate(OA_zoo, as.Date(cut(time(OA_zoo), "week")), sum)

但是,我不确定如何避免使用日期格式,而不是 num ,请多多指教!
而且,很明显,我已经通过分别执行每个变量来编写更多代码。通过使用plyr汇总每周,是否可以对我的df应用不同的FUN(总和/平均/最大/最小)?还是其他一些软件包?

However, I am not sure how to get around the fact that I am working with Date format rather than num and would appreciate any tips! Also, I have obviously been coding way to much by doing each of my variables separately. Would there be a way of applying different FUN (sum/mean/max/min) on my df by aggregating "weekly" using plyr? Or some other packages?

编辑/澄清
这是 dput 我的完整数据集样本的输出。我有2004-2011年的数据。我想用ggplot2查看/绘制的是时间(2004-2011)几周内的TIME(%H%M)的平均值/中位数。现在,我的数据不是每周汇总的,而是每天(随机样本)。

EDITS/CLARIFICATIONS Here's the dput output of a sample of my full dataset. I have data from 2004-2011. What I would like to look at/plot using ggplot2 is the mean/median of TIME (%H%M) aggregated in period of weeks over time (2004-2011). Right now, my data is not aggregated in week, but is daily (random sample).

> dput(godin)
structure(list(depth = c(878, 1200, 1170, 936, 942, 964, 951, 
953, 911, 969, 960, 987, 991, 997, 1024, 978, 1024, 951, 984, 
931, 1006, 929, 973, 986, 935, 989, 1042, 1015, 914, 984), duration = c(0.8, 
2.6, 6.5, 3.2, 4.1, 6.4, 7.2, 5.3, 7.4, 7, 7, 5.5, 7.5, 7.3, 
7.5, 7, 4.2, 3, 5, 5, 9.3, 7.9, 7.3, 7.2, 7, 5.2, 8, 6, 7.5, 
7), Greenland = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 40L, 28L, 0L, 
0L, 34L, 7L, 28L, 0L, 0L, 0L, 27L, 0L, 0L, 0L, 44L, 59L, 0L, 
0L, 0L, 0L, 0L, 0L), date2 = structure(c(12617, 12627, 12631, 
12996, 12669, 13036, 12669, 13036, 12670, 13036, 12670, 13037, 
12671, 13037, 12671, 13037, 12671, 13038, 12672, 13038, 12672, 
13038, 12672, 13039, 12631, 12997, 12673, 13039, 12673, 13039
), class = "Date"), TIME = c("0940", "0145", "0945", "2045", 
"1615", "0310", "2130", "1045", "0625", "1830", "1520", "0630", 
"0035", "1330", "0930", "2215", "2010", "0645", "0155", "1205", 
"0815", "1845", "2115", "0350", "1745", "0410", "0550", "1345", 
"1515", "2115")), .Names = c("depth", "duration", "Greenland", 
"date2", "TIME"), class = "data.frame", row.names = c("6761", 
"9019", "9020", "9021", "9022", "9023", "9024", "9025", "9026", 
"9027", "9028", "9029", "9030", "9031", "9032", "9033", "9034", 
"9035", "9036", "9037", "9038", "9039", "9040", "9041", "9042", 
"9043", "9044", "9045", "9046", "9047"))


推荐答案

我将这样处理:
首先用代表星期的字符串创建一列:

I'd approach it like this: first make a column with a string representing the week:

godin$week <- format(godin$date2, "%Y-W%U")

这将为您提供 2004-W26 之类的东西,对于汇总来说已经足够了。

this will give you something like "2004-W26", which will be good enough for aggregate.

然后,您需要将代表HHMM的字符向量转换为实际时间,以便可以在其上使用时间数学。

then you need to turn your character vector that represents HHMM into an actual time, so that you can use time math on it.

godin$time2 <- as.POSIXct(strptime(godin$TIME, "%H%M"))

注意:上面是个小技巧... strptime()假定当前日期(如果未指定),但不会妨碍此特定应用程序,因为所有转换的时间将具有相同的日期,均值的时间部分将是正确的。稍后我将删除日期...

NOTE: the above is a bit of a hack...strptime() assumes the current date if none is specified, but that shouldn't get in the way of this particular application, since all converted times will have the same date, the time part of the mean will be correct. I'll strip off the date later...

到那时,我认为您可以简单地进行汇总:

At that point, I think you can simply aggregate:

x2c <- aggregate(time2~week, data=godin, FUN=mean)

并删除不相关(和错误)的日期部分

and get rid of the irrelevant (and erroneous) date part

x2c$time2 <- format(x2c$time2,"%H:%M:%S")

et Voila。

> x2c
      week    time2
1 2004-W29 09:40:00
2 2004-W30 01:45:00
3 2004-W31 13:45:00
4 2004-W36 12:07:00
5 2004-W37 10:32:30
6 2005-W31 12:27:30
7 2005-W36 10:48:20
8 2005-W37 13:11:06

这里的教训是推销它很棘手在R中没有关联的日期。我很想听听其他人有更好的方法。

The lesson here is that its tricky to push around times with no associated dates in R. I'd love to hear from others who have a better way of doing this.

这篇关于汇总平均值“%H%M”在“一周”内R中的垃圾箱的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆