随时间衰减的行总和(rollapply) [英] Sum over rows (rollapply) with time decay
问题描述
这是我刚才发布的问题的后续问题(请参阅对具有多个更改条件的行进行求和R data.table 以获取更多详细信息)。我想计算3个受试者在过去5年中经历过一次活动的次数。因此,使用 zoo
软件包中的 rollapply
这假设5年前的经验和1年前的经验一样重要(相同的权重),所以现在我想包括一个时间衰减的经验,进入总和。这基本上意味着,5年前的经验并没有以与1年前的经验相同的权重来计算总和。
在我的案例中,我想包含一个年龄相关的衰变(即使对于其他应用程序,更快或更慢的衰减,如平方根或方块也是可能的)。
例如,假设我有以下数据(为了清楚起见,我建立在以前的数据基础上):
mydf< - data.frame(Year = c(2000,2001,2002,2004,2005,
2007,2000,2001,2002,2003,
2003 ,2004,2005,2006,2006,2007),
Name = c(Tom,Tom,Tom,Fred,Gill,
Fred,Gill ,Gill,Gill,Gill,Gill))
#为体验创建一个指标
mydf $ Ind< - 1
#加载需要的包
库(data.table)
库(动物园)
#set data.table
setDT(mydf)
setkey(mydf,Name,Year)
#执行笛卡尔联接来计算经验。 I2是新的经验指示符
m < - mydf [CJ(unique(Name),seq(min(Year)-5,max(Year))),allow.cartesian = TRUE] [,
列表(Ind = unique(Ind),I2 = sum(Ind,na.rm = TRUE)),
keyby = list(Name,Year)]
这是方法I已经走了这么远。注意,是一个简单的滚动总和I2
m [,Exp:= rollapply(I2,5,function(x)sum(head(x,-1)),
align ='right' = 0),by = Name]
现在问题是,如何包括年龄相关的衰变进入这个计算。为了建模,我需要将经验除以经验的年龄,然后再输入总和。
我一直在努力使用这些行的东西
m [,Exp_age:= rollapply(I2,5,function(x)sum(head(x,-1) /(tail((Year)) - head(Year,-1))),
/ pre>
align ='right',fill = 0),by = Name]
但它不工作。我认为我的主要问题是,我不能得到经验的年龄,所以我可以除以年龄的总和。结果应类似于
myres
data.frame $ c>中的
Exp_age
$ c>myres< - data.frame(Name = c(Fred,Fred,Fred ,Gill,Gill,Gill,Gill,
Tom,Tom, Tom,Tom,Tom),
Year = c(2003,2004,2004,2006,2007,2000,2001,2005,2007,2007,2005,2006,2007,2000,2001 ,2002,2002,2003),
Ind = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1),
Exp = c(0,1,1,3,4,0,1,1,1,2,3,0,1,2,2,4),
Exp_age = c(0, 1,1,1,33333333,1.916666667,0,1,0.45,
0.45,2.2,2,0,1,1.5,1.5,2.833333333))
任何指针都非常感谢!
解决方案正确地,您试图用
width = 5
做一个rollapply
,而不是做一个简单的和,你想做加权和。权重是相对于5年窗口的经验的年龄。我会这样做:首先设置你的data.table
中的键,使它具有适当的增加顺序Name
那么你知道你的x
变量中的最后一个项目是最小的,第一个项目是最旧的(你已经在你的代码中这样做)。我不能完全知道你想要的权重去哪儿(最小的有最大的重量或最老的),但你得到的点:setkey(m,Name,Year)
my_fun = function(x){w = 1:length(x); sum(x * w)}
m [,Exp_age:= rollapply(I2,width = 5,by = 1,fill = NA,FUN = my_fun,by.column = FALSE,align =right) = Name]
This is a follow on question to a question I posted earlier (see Sum over rows with multiple changing conditions R data.table for more details). I want to calculate how many times the 3 subjects have experienced an event in the last 5 years. So have been summing over a rolling window using
rollapply
from thezoo
package. This assumes that the experience 5 years ago is as important as the experience 1 year ago (same weighting), so now I want to include a time decay for the experience that enters the sum. This basically means that the experience 5 years ago does not enter into the sum with the same weighting as the experience 1 year ago.I my case I want to include an age dependent decay (even though for other applications faster or slower decays such as square root or squares could be possible).
For example lets assume I have the following data (I build on the previous data for clarity):
mydf <- data.frame (Year = c(2000, 2001, 2002, 2004, 2005, 2007, 2000, 2001, 2002, 2003, 2003, 2004, 2005, 2006, 2006, 2007), Name = c("Tom", "Tom", "Tom", "Fred", "Gill", "Fred", "Gill", "Gill", "Tom", "Tom", "Fred", "Fred", "Gill", "Fred", "Gill", "Gill")) # Create an indicator for the experience mydf$Ind <- 1 # Load require packages library(data.table) library(zoo) # Set data.table setDT(mydf) setkey(mydf, Name,Year) # Perform cartesian join to calculate experience. I2 is the new experience indicator m <- mydf[CJ(unique(Name),seq(min(Year)-5, max(Year))),allow.cartesian=TRUE][, list(Ind = unique(Ind), I2 = sum(Ind,na.rm=TRUE)), keyby=list(Name,Year)] # This is the approach I have been taking so far. Note that is a simple rolling sum of I2 m[,Exp := rollapply(I2, 5, function(x) sum(head(x,-1)), align = 'right', fill=0),by=Name]
So question now is, how can I include a age dependent decay into this calculation. To model this I need to divide the experience by the age of the experience before it enters the sum.
I have been trying to get it to work using something along these lines:
m[,Exp_age := rollapply(I2, 5, function(x) sum(head(x,-1)/(tail((Year))-head(Year,-1))), align = 'right', fill=0),by=Name]
But it does not work. I think my main problem is that I cannot get the age of the experience right so I can divide by the age in the sum. The result should look like the
Exp_age
column in themyres
data.frame
belowmyres <- data.frame(Name = c("Fred", "Fred", "Fred", "Fred", "Fred", "Gill", "Gill", "Gill", "Gill", "Gill", "Gill", "Tom", "Tom", "Tom", "Tom", "Tom"), Year = c(2003, 2004, 2004, 2006, 2007, 2000, 2001, 2005, 2005, 2006, 2007, 2000, 2001, 2002, 2002, 2003), Ind = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), Exp = c(0, 1, 1, 3, 4, 0, 1, 1, 1, 2, 3, 0, 1, 2, 2, 4), Exp_age = c(0, 1, 1, 1.333333333, 1.916666667, 0, 1, 0.45, 0.45, 2.2, 2, 0, 1, 1.5, 1.5, 2.833333333))
Any pointers would be greatly appreciated!
解决方案If I understand you correctly, you are trying to do a
rollapply
withwidth=5
and rather than do a simple sum, you want to do a weighted sum. The weights are the age of the experience relative to the 5 year window. I would do this: first set the key in yourdata.table
so that it has proper increasing order byName
, then you know that the last item in yourx
variable is the youngest and the first item is the oldest (you do this in your code already). I can't quite tell which way you want the weights to go (youngest to have greatest weight or oldest) but you get the point:setkey(m, Name, Year) my_fun = function(x) { w = 1:length(x); sum(x*w)} m[,Exp_age:=rollapply(I2, width=5, by=1, fill=NA, FUN=my_fun, by.column=FALSE, align="right") ,by=Name]
这篇关于随时间衰减的行总和(rollapply)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!