按日期计算data.frame列平均值 [英] Compute data.frame column averages by date

查看:1137
本文介绍了按日期计算data.frame列平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在R中有一个data.frame,其中一列是日期列表(其中许多是重复的),而另一列是在该日期记录的温度。问题的列看起来像这样(但是有几千行和其他一些不必要的列):

  Date | Temp 
-----------------
1/2/13 34.4
1/2/13 36.4
1/2 / 13 34.3
1/4/13 45.6
1/4/13 33.5
1/5/13 45.2

我需要找到获取日平均温度的方法。所以理想情况下,我可以告诉R循环访问data.frame,并且匹配每个日期,给我一个当天温度的平均值。我一直在使用googling,而且我知道R中的循环是可能的,但是我不能用我对R代码所知甚少的概念来概括我的头。



I知道我可以拉出一个单一的列和平均值(即 mean(data.frame [[2]])),但我完全失去了如何告诉R匹配的意思是位于第一列的单个值。



另外,我怎样才能每7个日历日就产生一个平均值(不管有多少个条目存在单日)?所以,如果我的日期范围是从1/1/13开始的,我会得到在1/1/13和1/7/13之间所有临时的平均值,然后在1/8/13和1/15/13之间,等等...



任何帮助我掌握R循环的帮助都是非常值得赞赏的。谢谢!

编辑

以下是 dput头(my.dataframe)) 请注意:我编辑了日期和时间戳,因为他们都继续了几千条条目:否则:

(列表(RECID = 579:584,SITEID = c(101L,101L,101L,
101L,101L,101L) ,MONTH = c(6L,6L,6L,6L,6L,6L),DAY = c(7L,
7L,7L,7L,7L,7L),DATE = structure(c(34L,34L, ,34L,34L,
34L),。标签= c(2013/10/1,2013/10/13,10/11/2013​​,10/12/2013​​,
10/2/2013​​,10/3/2013​​,10/4/2013​​,10/5/2013​​,10/6/2013​​,
10 / 7/2013,10/8/2013,10/9/2013,6/10/2013,6/11/2013,9/9/2013),class =factor ),TIMESTAMP =结构(784:789,。标签= c(10/1/2013 0:00,
2013年1月1日1:00,2013年10月1日10: 00,2013年10月1日上午11:00,2013年10月1日上午12点, ,2013年10月1日15:00,2013年10月1日16:00,
2013年10月1日17:00,2013年10月1日18: (23.376,23.376,23.833,24.146,
24.219),TEMP = c(23.376,23.376,23.833,24.146,
24.219),10/1/2013 19:00 ,24.05),XC = c(NA,NA,NA,NA,NA,NA)).Names = c(RECID,
SITEID,MONTH,DAY ,TIMESTAMP,TEMP,XC),row.names = c(NA,
6L),class =data.frame)
pre

解决方案

  library(plyr)

ddply ,。(Date),summarize,daily_mean_Temp = mean(Temp))

这是一个简单的例子包装是一个更高层次的分解应用组合模式。



另一个选择是Ananda Mahto提到的, dplyr 性能重写 plyr 。他显示了语法。



方法2: aggregate()在功能上也是等价的,口哨比 plyr / dplyr






另外 >'每7个日历日生成一次平均值':您的意思是 '年平均值 ,或 '移动7天平均(尾随/领先/居中)'

I have a data.frame in R where one column is a list of dates (many of which are duplicates), whereas the other column is a temperature recorded on that date. The columns in question look like this (but is several thousand rows and a few other unnecessary cols):

Date    |    Temp
-----------------
1/2/13     34.4
1/2/13     36.4
1/2/13     34.3
1/4/13     45.6
1/4/13     33.5
1/5/13     45.2

I need to find a way of getting a daily average for temperature. So ideally, I could tell R to loop through the data.frame and for every date that matched, give me an average for the temperature that day. I've been googling and I know loops in R are possible, but I can't wrap my head around this conceptually given what little I know about R code.

I know I can pull out a single column and average it (i.e. mean(data.frame[[2]])) but I'm utterly lost on how to tell R to match that mean to a single value located in the first column.

Additionally, how could I generate an average for every seven calendar days (regardless of how many entries exist for a single day)? So, a seven day rolling average, i.e. if my date range starts at 1/1/13 I'd get an average for all temps taken between 1/1/13 and 1/7/13, and then between 1/8/13 and 1/15/13 and so on...

Any assistance helping me grasp R loops is much appreciated. Thank you!

EDIT

Here's the output of dput(head(my.dataframe)) PLEASE NOTE: I edited down both "date" and "timestamp" because they both go on for several thousand entries otherwise:

structure(list(RECID = 579:584, SITEID = c(101L, 101L, 101L, 
101L, 101L, 101L), MONTH = c(6L, 6L, 6L, 6L, 6L, 6L), DAY = c(7L, 
7L, 7L, 7L, 7L, 7L), DATE = structure(c(34L, 34L, 34L, 34L, 34L, 
34L), .Label = c("10/1/2013", "10/10/2013", "10/11/2013", "10/12/2013", 
"10/2/2013", "10/3/2013", "10/4/2013", "10/5/2013", "10/6/2013", 
"10/7/2013", "10/8/2013", "10/9/2013", "6/10/2013", "6/11/2013","9/9/2013"), class = "factor"), TIMESTAMP = structure(784:789, .Label = c("10/1/2013 0:00", 
"10/1/2013 1:00", "10/1/2013 10:00", "10/1/2013 11:00", "10/1/2013 12:00", 
"10/1/2013 13:00", "10/1/2013 14:00", "10/1/2013 15:00", "10/1/2013 16:00", 
"10/1/2013 17:00", "10/1/2013 18:00", "10/1/2013 19:00", "10/1/2013 2:00"), class = "factor"), TEMP = c(23.376, 23.376, 23.833, 24.146, 
24.219, 24.05), X.C = c(NA, NA, NA, NA, NA, NA)), .Names = c("RECID", 
"SITEID", "MONTH", "DAY", "DATE", "TIMESTAMP", "TEMP", "X.C"), row.names = c(NA, 
6L), class = "data.frame") 

解决方案

library(plyr)

ddply(df, .(Date), summarize, daily_mean_Temp = mean(Temp))

This is a simple example of the Split-Apply-Combine paradigm.

Alternative #1 as Ananda Mahto mentions, dplyr package is a higher-performance rewrite of plyr. He shows the syntax.

Alternative #2: aggregate() is also functionally equivalent, just has fewer bells-and-whistles than plyr/dplyr.


Additionally 'generate average for every 7 calendar days': do you mean 'average-by-week-of-year', or 'moving 7-day average (trailing/leading/centered)'?

这篇关于按日期计算data.frame列平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆