聚合平均值“%H%M"在“周"中R中的垃圾箱 [英] Aggregating mean "%H%M" in "week" bins in R

查看:19
本文介绍了聚合平均值“%H%M"在“周"中R中的垃圾箱的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经为此苦苦挣扎了一段时间.我刚开始使用 ts 数据和所有相关的 R 包.我有一个 df 有几个变量,包括 GMT 中的一天中的时间"%H%M"和日期%Y/%m/%e"采样发生.我想将我的日期数据分箱/聚合为周"(即 %W/%g),并计算在该周进行采样时的平均一天中的时间".

I have been struggling with this for a while. I am new to working with ts data and all related R packages. I have a df with several variables including what 'time of day'in GMT "%H%M" and date "%Y/%m/%e" sampling occurred. I want to bin/aggregate my date data into "weeks" (i.e., %W/%g) and calculate the mean 'time of the day' when sampling occurred during that week.

通过首先将我的 df 转换为动物园对象,然后使用如下所示的 aggregate.zoo 命令,我能够计算数值变量(例如,重量)的其他 FUN:

I was able to calculate other FUN on numerical variables (e.g., weight) by first transforming my df into a zoo object and then using aggregate.zoo command as follow:

#calculate the sum weight captured every week 
x2c <- aggregate(OA_zoo, as.Date(cut(time(OA_zoo), "week")), sum)

但是,我不确定如何解决我使用的是 Date 格式 而不是 num 的事实,如果您有任何提示,我将不胜感激!此外,我显然已经通过分别处理我的每个变量进行了大量编码.是否有一种方法可以通过使用 plyr 聚合每周"来在我的 df 上应用不同的 FUN (sum/mean/max/min)?或者其他一些包?

However, I am not sure how to get around the fact that I am working with Date format rather than num and would appreciate any tips! Also, I have obviously been coding way to much by doing each of my variables separately. Would there be a way of applying different FUN (sum/mean/max/min) on my df by aggregating "weekly" using plyr? Or some other packages?

编辑/澄清这是我的完整数据集样本的 dput 输出.我有 2004-2011 年的数据.我想使用 ggplot2 查看/绘图的是随着时间的推移(2004-2011)在几周内聚合的 TIME (%H%M) 的平均值/中位数.现在,我的数据不是按周汇总的,而是按日汇总的(随机样本).

EDITS/CLARIFICATIONS Here's the dput output of a sample of my full dataset. I have data from 2004-2011. What I would like to look at/plot using ggplot2 is the mean/median of TIME (%H%M) aggregated in period of weeks over time (2004-2011). Right now, my data is not aggregated in week, but is daily (random sample).

> dput(godin)
structure(list(depth = c(878, 1200, 1170, 936, 942, 964, 951, 
953, 911, 969, 960, 987, 991, 997, 1024, 978, 1024, 951, 984, 
931, 1006, 929, 973, 986, 935, 989, 1042, 1015, 914, 984), duration = c(0.8, 
2.6, 6.5, 3.2, 4.1, 6.4, 7.2, 5.3, 7.4, 7, 7, 5.5, 7.5, 7.3, 
7.5, 7, 4.2, 3, 5, 5, 9.3, 7.9, 7.3, 7.2, 7, 5.2, 8, 6, 7.5, 
7), Greenland = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 40L, 28L, 0L, 
0L, 34L, 7L, 28L, 0L, 0L, 0L, 27L, 0L, 0L, 0L, 44L, 59L, 0L, 
0L, 0L, 0L, 0L, 0L), date2 = structure(c(12617, 12627, 12631, 
12996, 12669, 13036, 12669, 13036, 12670, 13036, 12670, 13037, 
12671, 13037, 12671, 13037, 12671, 13038, 12672, 13038, 12672, 
13038, 12672, 13039, 12631, 12997, 12673, 13039, 12673, 13039
), class = "Date"), TIME = c("0940", "0145", "0945", "2045", 
"1615", "0310", "2130", "1045", "0625", "1830", "1520", "0630", 
"0035", "1330", "0930", "2215", "2010", "0645", "0155", "1205", 
"0815", "1845", "2115", "0350", "1745", "0410", "0550", "1345", 
"1515", "2115")), .Names = c("depth", "duration", "Greenland", 
"date2", "TIME"), class = "data.frame", row.names = c("6761", 
"9019", "9020", "9021", "9022", "9023", "9024", "9025", "9026", 
"9027", "9028", "9029", "9030", "9031", "9032", "9033", "9034", 
"9035", "9036", "9037", "9038", "9039", "9040", "9041", "9042", 
"9043", "9044", "9045", "9046", "9047"))

推荐答案

我会这样处理:首先用代表星期的字符串制作一列:

I'd approach it like this: first make a column with a string representing the week:

godin$week <- format(godin$date2, "%Y-W%U")

这会给你类似 "2004-W26" 的东西,这对于 aggregate 来说已经足够了.

this will give you something like "2004-W26", which will be good enough for aggregate.

然后你需要把你的代表 HHMM 的字符向量变成一个实际的时间,这样你就可以在它上面使用时间数学了.

then you need to turn your character vector that represents HHMM into an actual time, so that you can use time math on it.

godin$time2 <- as.POSIXct(strptime(godin$TIME, "%H%M"))

注意:以上内容有点小技巧...strptime() 如果未指定任何内容,则假定当前日期,但这不应妨碍此特定应用程序,因为所有转换后的时间都将具有相同的日期,平均值的时间部分将是正确的.我稍后会取消约会...

NOTE: the above is a bit of a hack...strptime() assumes the current date if none is specified, but that shouldn't get in the way of this particular application, since all converted times will have the same date, the time part of the mean will be correct. I'll strip off the date later...

那时,我认为您可以简单地聚合:

At that point, I think you can simply aggregate:

x2c <- aggregate(time2~week, data=godin, FUN=mean)

去掉不相关(和错误)的日期部分

and get rid of the irrelevant (and erroneous) date part

x2c$time2 <- format(x2c$time2,"%H:%M:%S")

等等.

> x2c
      week    time2
1 2004-W29 09:40:00
2 2004-W30 01:45:00
3 2004-W31 13:45:00
4 2004-W36 12:07:00
5 2004-W37 10:32:30
6 2005-W31 12:27:30
7 2005-W36 10:48:20
8 2005-W37 13:11:06

这里的教训是,在 R 中移动没有关联日期的时间是很棘手的.我很想听听其他有更好方法的人的意见.

The lesson here is that its tricky to push around times with no associated dates in R. I'd love to hear from others who have a better way of doing this.

这篇关于聚合平均值“%H%M"在“周"中R中的垃圾箱的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆