从R中的data.table计算组的平均每月总计 [英] Calculate average monthly total by groups from data.table in R

查看:111
本文介绍了从R中的data.table计算组的平均每月总计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据表,在30年期间每天有一行,有多个不同的变量列。使用data.table的原因是,我使用的.csv文件是巨大的(大约120万行),因为有30年的数据由一个名为'key'的列的特征组。 >

示例数据集如下所示:

 键日期Runoff 
A 1980-01-01 2
A 1980-01-02 1
A 1981-01-01 0.1
A 1981-01-02 3
A 1982-01-01 2
A 1982-01-02 5
B 1980-01-01 1.5
B 1980-01-02 0.5
B 1981-01-01 0.3
B 1981-01-02 2
B 1982-01-01 1.5
B 1982-01-02 4

上面是两个键的示例,一些三年的一些数据显示了我的意思。实际的数据集每个键有数百个键和30年的数据。



我想要做的是产生一个输出,每个键的每个月如下所示:

 键一月二月三月.... etc 
A 4.36 ... ...
B 3.26 ... ...

ie A的一月份的总平均值=(2 + 1)+(0.1 + 3)+(2 + 5)/ 3



一个三十年的数据集(即只有一个键)我已经成功地使用下面的代码来实现这一点:

  runoff_tot_average < (DF $ Runoff,format(DF $ Date,'%m'))/ 30 

是一个30年数据集的数据框架。



因此,我可以请建议如何修改我的代码以使用更大的数据集与许多钥匙一个全新的解决方案!



谢谢你,



J


$ b b

EDIT



以下代码生成上述数据示例:

  Key <-c(A,A,A,A,A,A,B,B,B,B B,B)
Date< - as.Date(c(1980-01-01,1980-01-02,1981-01-01,1981-01- 02,1982-01-01,1982-01-02,1980-01-01,1980-01-02,1981-01-01,1981-01-02 ,1982-01-01,1982-01-02))
Runoff <-c(2,1,0.1,3,2,5,1.5,0.5,0.3,2,1.5, 4)
DT< - data.table(Key,Date,Runoff)


解决方案

他们只有这样我才能想到这是两个步骤。可能不是最好的方式,但这里是

  DT [,c(YM,Month):= list substr(Date,1,7),substr(Date,6,7))] 
DT [,Runoff2:= sum(Runoff),by = c(Key,YM)]
DT [,mean(Runoff2),by = c(Key,Month)]

##关键月V1
## 1:A 01 4.366667
## 2:B 01 3.266667






另一种(非常相似)的方式:

  DT [,c(year,month):= list日期),月(日期)]] 
DT [,Runoff2:= sum(Runoff),by = list(Key,year,month)]
DT [,mean (Key,month)]

请注意,您并不创建新列,通过也支持表达式。也就是说,您可以直接在中使用它们,如下所示:

  DT [,Runoff2:= sum(Runoff),by = list(Key,year = year(Date),month = month(Date))] 

但是因为你需要多次聚合,所以最好(为速度)将它们存储为额外的列,就像@David在这里显示的那样。


I have a data.table with a row for each day over a 30 year period with a number of different variable columns. The reason for using data.table is that the .csv file I'm using is huge (approx 1.2 million rows) as there are 30 years worth of data for a number of groups charactertised by a column called 'key'.

An example dataset is shown below:

Key   Date          Runoff
A     1980-01-01    2
A     1980-01-02    1
A     1981-01-01    0.1
A     1981-01-02    3
A     1982-01-01    2
A     1982-01-02    5
B     1980-01-01    1.5
B     1980-01-02    0.5
B     1981-01-01    0.3
B     1981-01-02    2
B     1982-01-01    1.5
B     1982-01-02    4

The above is a sample of two 'keys', with some data for January over three years to show what I mean. The actual dataset has hundreds of 'keys' and 30 years worth of data for each 'key'.

What I want to do is produce an output that has the total average for each month for each key as is shown below:

Key   January  February  March.... etc
A     4.36     ...       ...
B     3.26     ...       ...

i.e. the total average for January for Key A = (2 + 1) + (0.1 + 3) + (2 + 5) / 3

When I have done this analysis on one thirty year dataset (i.e. just one key) I have used the following code successfully to achieve this:

runoff_tot_average <- rowsum(DF$Runoff, format(DF$Date, '%m')) / 30

Where DF is the dataframe for one 30 year dataset.

So could I please have suggestions on how to modify my code above to work with the larger dataset with many 'keys' or offer a completely new solution!

Thank you,

J

EDIT

The below code produces the above data example:

Key <- c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B")
Date <- as.Date(c("1980-01-01", "1980-01-02", "1981-01-01", "1981-01-02", "1982-01-01", "1982-01-02", "1980-01-01", "1980-01-02", "1981-01-01", "1981-01-02", "1982-01-01", "1982-01-02"))
Runoff <- c(2, 1, 0.1, 3, 2, 5, 1.5, 0.5, 0.3, 2, 1.5, 4)
DT <- data.table(Key, Date, Runoff)

解决方案

They only way I could think of doing it was in two steps. Probably not the best way, but here goes

DT[, c("YM", "Month") := list(substr(Date, 1, 7), substr(Date, 6, 7))]
DT[, Runoff2 := sum(Runoff), by = c("Key", "YM")]
DT[, mean(Runoff2), by = c("Key", "Month")]

##   Key Month       V1
## 1:   A    01 4.366667
## 2:   B    01 3.266667


Just to show another (very similar) way:

DT[, c("year", "month") := list(year(Date), month(Date))]
DT[, Runoff2 := sum(Runoff), by=list(Key, year, month)]
DT[, mean(Runoff2), by=list(Key, month)]

Note that you don't have to create new columns, as by supports expressions as well. That is, you can directly use them in by as follows:

DT[, Runoff2 := sum(Runoff), by=list(Key, year = year(Date), month = month(Date))]

But since you require to aggregate more than once, it's better (for speed) to store them as additional columns, as @David has shown here.

这篇关于从R中的data.table计算组的平均每月总计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆