如何在R中按小时计算变量的平均值 [英] How to calculate average of a variable by hour in R
问题描述
尝试按小时计算平均温度时遇到麻烦.
I'm having trouble when trying to calculate the average temperature by hour.
我有一个数据框,其中包含日期,时间(hh:mm:ss p.m./a.m.)和温度. 我需要的是按小时提取平均温度,以绘制温度的每日变化.
I have a data frame with date, time (hh:mm:ss p.m./a.m.)and temperature. What I need is to extract the mean temperature by hour in order to plot daily variation of temperature.
我是R的新手,但尝试使用我所知道的方法:我首先尝试将小时转换为数字,然后提取前两个字符,然后计算均值,但效果不佳.而且,我要分析的文件太多,以至于拥有比我发现的解决方案"更自动化,更干净的文件.
I'm new to R, but did a try with what I know: I first tried by transforming hours into numbers, then extracting the first two characters, and then to calculate the mean but it didn't work very well. Moreover I have so many files to analize that it would be much better to have something more automated and clean than the "solution" I found.
我相信这是按小时计算R中平均值的更好方法,因此我一直在这里的其他帖子中寻找答案.不幸的是,我找不到关于从时间数据中提取统计信息的明确答案.
I believe it must be a better way to calculate averages by hours in R so I've been looking for the answer in other posts here. Unfortunately I couldn't find a clear answer regarding extracting statistics from time data.
我的数据看起来像这样
date hour temperature
1 28/12/2013 13:03:01 41.572
2 28/12/2013 13:08:01 46.059
3 28/12/2013 13:13:01 48.55
4 28/12/2013 13:18:01 49.546
5 28/12/2013 13:23:01 49.546
6 28/12/2013 13:28:01 49.546
7 28/12/2013 13:33:01 50.044
8 28/12/2013 13:38:01 50.542
9 28/12/2013 13:43:01 50.542
10 28/12/2013 13:48:01 51.04
11 28/12/2013 13:53:01 51.538
12 28/12/2013 13:58:01 51.538
13 28/12/2013 14:03:01 50.542
14 28/12/2013 14:08:01 51.04
15 28/12/2013 14:13:01 51.04
16 28/12/2013 14:18:01 52.534
17 28/12/2013 14:23:01 53.031
18 28/12/2013 14:28:01 53.031
19 28/12/2013 14:33:01 53.031
20 28/12/2013 14:38:01 51.538
21 28/12/2013 14:43:01 53.031
22 28/12/2013 14:48:01 53.529
etc (24hs data)
我希望R计算每小时的平均值(不考虑分钟或秒的差异,只是按小时计算)
And I would like R to calculate average per hour (without taking into account differences in minutes or seconds, just by hour)
有什么建议吗? 提前非常感谢您!
Any suggestion? Thank you very much in advance!
关于, 玛丽亚
推荐答案
如果在问题中给出样本数据和预期输出,将总是更加容易.
It would always easier if sample data and expected output is given in the question.
使用Data.table程序包解决方案
Solution with Data.table package
require(data.table)
data <- fread('temp.csv',sep=',') #Assuming your data is in temp.csv
#if above step not executed, convert the data frame to data.table
data <- data.table(data)
> str(data)
Classes ‘data.table’ and 'data.frame': 12 obs. of 3 variables:
$ date : chr "28/12/2013" "28/12/2013" "28/12/2013" "28/12/2013" ...
$ hour : chr "13:03:01" "13:08:01" "13:13:01" "13:18:01" ...
$ temperature: num 41.6 46.1 48.5 49.5 49.5 ...
> data
date hour temperature avg
1: 27/12/2013 13:00:00 42.99 35.78455
2: 27/12/2013 14:00:00 65.97 35.78455
3: 27/12/2013 15:00:00 63.57 35.78455
data[,list(avg=mean(temperature)),by=hour] #dataset is sorted by hour
hour avg
1: 13:00:00 42.99
2: 14:00:00 65.97
3: 15:00:00 63.57
data[,list(avg=mean(temperature)),by="date,hour"] #data set is grouped by date,then hour
date hour avg
1: 27/12/2013 13:00:00 42.99
2: 27/12/2013 14:00:00 65.97
3: 27/12/2013 15:00:00 63.57
data[,list(avg=mean(temperature)),by=list(date,hour(as.POSIXct(data$hour, format = "%H:%M:%S")))] # to group by hour only
date hour avg
1: 27/12/2013 1 29.530
2: 27/12/2013 4 65.970
这篇关于如何在R中按小时计算变量的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!