如何计算大型数据集的平均值 [英] How to calculate average values large datasets

查看:313
本文介绍了如何计算大型数据集的平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一个数据集,该数据集每小时100天,每天24小时读取一次温度,持续100多年.我想获得每天的平均温度以减少数据集的大小.标题如下:

I am working with a dataset that has temperature readings once an hour, 24 hrs a day for 100+ years. I want to get an average temperature for each day to reduce the size of my dataset. The headings look like this:

     YR MO DA HR MN TEMP
  1943  6 19 10  0   73
  1943  6 19 11  0   72
  1943  6 19 12  0   76
  1943  6 19 13  0   78
  1943  6 19 14  0   81
  1943  6 19 15  0   85
  1943  6 19 16  0   85
  1943  6 19 17  0   86
  1943  6 19 18  0   86
  1943  6 19 19  0   87

等,可获取600,000多个数据点.

etc for 600,000+ data points.

如何运行嵌套函数来计算每日平均温度,以便保留YR,MO,DA和TEMP? 掌握了这些信息后,我希望能够查看长期平均值和计算得出30年一月的平均温度.我该怎么做?

How can I run a nested function to calculate daily average temperature so i preserve the YR, MO, DA, TEMP? Once I have this, I want to be able to look at long term averages & calculate say the average temperature for the Month of January across 30 years. How do I do this?

推荐答案

第一步,您可以这样做:

In one step you could do this:

 meanTbl <- with(datfrm, tapply(TEMP, ISOdate(YR, MO, DA), mean) )

这为您提供了日期时间格式的索引以及值.如果您只想将日期作为字符而没有尾随时间:

This gives you a date-time formatted index as well as the values. If you wanted just the Date as character without the trailing time:

meanTbl <- with(dat, tapply(TEMP, as.Date(ISOdate(YR, MO, DA)), mean) )

每月平均值可以通过以下方式得出:

The monthly averages could be done with:

 monMeans <- with(meanTbl, tapply(TEMP, MO, mean))

这篇关于如何计算大型数据集的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆