使用R标准格式的日期/时间从大型数据集中计算每日平均价值? [英] Calculate average daily value from large data set with R standard format date/times?

查看:223
本文介绍了使用R标准格式的日期/时间从大型数据集中计算每日平均价值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大约一千万行的数据框,横跨大约570天。在使用striptime转换日期和时间后,数据如下所示:

I have a dataframe of approximately 10 million rows spanning about 570 days. After using striptime to convert the dates and times, the data looks like this:

          date          X1   
1 2004-01-01 07:43:00 1.2587 
2 2004-01-01 07:47:52 1.2585
3 2004-01-01 17:46:14 1.2586 
4 2004-01-01 17:56:08 1.2585
5 2004-01-01 17:56:15 1.2585 

我想计算每天的平均值(例如一年中的天数,而不是一周中的天数),然后将其绘制出来。例如。获取所有具有日 2004-01-01的行,计算平均价格,然后对 2004-01-2执行相同的操作,依此类推。

I would like to compute the average value on each day (as in days of the year, not days of the week) and then plot them. Eg. Get all rows which have day "2004-01-01", compute average price, then do the same for "2004-01-2" and so on.

类似地我会对寻找平均每月价值或每小时价格感兴趣,但是我想一旦知道如何获取平均每日价格,便可以解决这些问题。

Similarly I would be interested in finding the average monthly value, or hourly price, but I imagine I can work these out once I know how to get average daily price.

我最大的这里的困难是自动从date变量中提取一年中的日期。我如何循环使用所有365天并计算每天的平均值,并将其存储在列表中?

My biggest difficulty here is extracting the day of the year from the date variable automatically. How can I cycle through all 365 days and compute the average value for each day, storing it in a list?

我能够使用weekdays()函数找到一周中某天的平均值,但是我找不到与此类似的东西。

I was able to find the average value for day of the week using the weekdays() function, but I couldn't find anything similar for this.

推荐答案

以下是使用 dplyr lubridate 。首先,使用 floor_date 将日期四舍五入为最接近的日期单位,以简化日期(请参见 thelatemail ),然后 group_by 日期,并使用总结计算平均值:

Here's a solution using dplyr and lubridate. First, simplify the date by rounding it down to the nearest day-unit using floor_date (see below comment by thelatemail), then group_by date and calculate the mean value using summarize:

library(dplyr)
library(lubridate)

df %>%
  mutate(date = floor_date(date)) %>%
  group_by(date) %>%
  summarize(mean_X1 = mean(X1))

使用 lubridate 软件包,您可以使用类似的方法按月获取平均值,周或小时。例如,按月计算平均值:

Using the lubridate package, you can use a similar method to get the average by month, week, or hour. For example, to calculate the average by month:

df %>%
  mutate(date = month(date)) %>%
  group_by(date) %>%
  summarize(mean_X1 = mean(X1))

按小时计:

df %>%
  mutate(date = hour(date)) %>%
  group_by(date) %>%
  summarize(mean_X1 = mean(X1))

这篇关于使用R标准格式的日期/时间从大型数据集中计算每日平均价值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆