如何计算大型数据集的平均值 [英] How to calculate average values large datasets
问题描述
我正在使用一个数据集,该数据集每小时100天,每天24小时读取一次温度,持续100多年.我想获得每天的平均温度以减少数据集的大小.标题如下:
I am working with a dataset that has temperature readings once an hour, 24 hrs a day for 100+ years. I want to get an average temperature for each day to reduce the size of my dataset. The headings look like this:
YR MO DA HR MN TEMP
1943 6 19 10 0 73
1943 6 19 11 0 72
1943 6 19 12 0 76
1943 6 19 13 0 78
1943 6 19 14 0 81
1943 6 19 15 0 85
1943 6 19 16 0 85
1943 6 19 17 0 86
1943 6 19 18 0 86
1943 6 19 19 0 87
等,可获取600,000多个数据点.
etc for 600,000+ data points.
如何运行嵌套函数来计算每日平均温度,以便保留YR,MO,DA和TEMP? 掌握了这些信息后,我希望能够查看长期平均值和计算得出30年一月的平均温度.我该怎么做?
How can I run a nested function to calculate daily average temperature so i preserve the YR, MO, DA, TEMP? Once I have this, I want to be able to look at long term averages & calculate say the average temperature for the Month of January across 30 years. How do I do this?
推荐答案
第一步,您可以这样做:
In one step you could do this:
meanTbl <- with(datfrm, tapply(TEMP, ISOdate(YR, MO, DA), mean) )
这为您提供了日期时间格式的索引以及值.如果您只想将日期作为字符而没有尾随时间:
This gives you a date-time formatted index as well as the values. If you wanted just the Date as character without the trailing time:
meanTbl <- with(dat, tapply(TEMP, as.Date(ISOdate(YR, MO, DA)), mean) )
每月平均值可以通过以下方式得出:
The monthly averages could be done with:
monMeans <- with(meanTbl, tapply(TEMP, MO, mean))
这篇关于如何计算大型数据集的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!