长格式数据框架如何计算增长率? [英] How calculate growth rate in long format data frame?
问题描述
df< - data.frame(Category = c(rep( A,6),rep(B,6)),
pre>
Year = rep(2010:2015,2),Value = 1:12)
我很难在类别中创建增长率列(按年份)。任何人都可以帮助代码创建这样的东西...
类别年度增长
A 2010 1
A 2011 2 1.000
A 2012 3 0.500
A 2013 4 0.333
A 2014 5 0.250
A 2015 6 0.200
B 2010 7
B 2011 8 0.143
B 2012 9 0.125
B 2013 10 0.111
B 2014 11 0.100
B 2015 12 0.091
解决方案对于这些问题(如何按类别YYY计算XXX)?总是存在基于
by()
,data.table()
包和plyr
。我通常更喜欢plyr
,这往往较慢,但(对我来说)更透明/优雅。df< - data.frame(Category = c(rep(A,6),rep(B,6)),
Year = rep(2010: 2),Value = 1:12)
库(plyr)
ddply(df,Category,transform,
Growth = c(NA,exp (差异(log(Value))) - 1))
krlmr的是,我使用几何平均技巧(取日志差异,然后取幂),而@krlmr计算显式比率。
数学上,
diff(log(Value))
正在使用日志的差异,即log(x [t + 1]) - log(x [t])
为所有t
。当我们取幂时,我们得到这个比率x [t + 1] / x [t]
(因为exp(log(x [t + 1] )-log(x [t]))= exp(log(x [t + 1]))/ exp(log(x [t]))= x [t + 1] / x [t] $ c>)。 OP想要分数变化,而不是乘法增长率(即
x [t + 1] == x [t]
对应于零的分数变化而不是乘法增长率为1.0),所以我们减去1.
我也在使用
transform()
额外的句法糖,以避免创建一个新的匿名函数。With data structured as follows...
df <- data.frame(Category=c(rep("A",6),rep("B",6)), Year=rep(2010:2015,2),Value=1:12)
I'm having a tough time creating a growth rate column (by year) within category. Can anyone help with code to create something like this...
Category Year Value Growth A 2010 1 A 2011 2 1.000 A 2012 3 0.500 A 2013 4 0.333 A 2014 5 0.250 A 2015 6 0.200 B 2010 7 B 2011 8 0.143 B 2012 9 0.125 B 2013 10 0.111 B 2014 11 0.100 B 2015 12 0.091
解决方案For these sorts of questions ("how do I compute XXX by category YYY")? there are always solutions based on
by()
, thedata.table()
package, andplyr
. I generally preferplyr
, which is often slower, but (to me) more transparent/elegant.df <- data.frame(Category=c(rep("A",6),rep("B",6)), Year=rep(2010:2015,2),Value=1:12) library(plyr) ddply(df,"Category",transform, Growth=c(NA,exp(diff(log(Value)))-1))
The main difference between this answer and @krlmr's is that I am using a geometric-mean trick (taking differences of logs and then exponentiating) while @krlmr computes an explicit ratio.
Mathematically,
diff(log(Value))
is taking the differences of the logs, i.e.log(x[t+1])-log(x[t])
for allt
. When we exponentiate that we get the ratiox[t+1]/x[t]
(becauseexp(log(x[t+1])-log(x[t])) = exp(log(x[t+1]))/exp(log(x[t])) = x[t+1]/x[t]
). The OP wanted the fractional change rather than the multiplicative growth rate (i.e.x[t+1]==x[t]
corresponds to a fractional change of zero rather than a multiplicative growth rate of 1.0), so we subtract 1.I am also using
transform()
for a little bit of extra "syntactic sugar", to avoid creating a new anonymous function.这篇关于长格式数据框架如何计算增长率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!