对具有时间衰减的行求和(rollapply) [英] Sum over rows (rollapply) with time decay
问题描述
这是我之前发布的问题的后续问题(请参阅 汇总具有多个变化条件的行 R data.table 了解更多详细信息).我想计算这 3 名受试者在过去 5 年中经历了多少次事件.所以一直在使用 zoo
包中的 rollapply
对滚动窗口求和.这假设 5 年前的经验与 1 年前的经验一样重要(相同的权重),所以现在我想为输入总和的经验包括一个时间衰减.这基本上意味着 5 年前的经验不会以与 1 年前的经验相同的权重进入总和.
This is a follow on question to a question I posted earlier (see Sum over rows with multiple changing conditions R data.table for more details). I want to calculate how many times the 3 subjects have experienced an event in the last 5 years. So have been summing over a rolling window using rollapply
from the zoo
package. This assumes that the experience 5 years ago is as important as the experience 1 year ago (same weighting), so now I want to include a time decay for the experience that enters the sum. This basically means that the experience 5 years ago does not enter into the sum with the same weighting as the experience 1 year ago.
在我的情况下,我想包括一个与年龄相关的衰减(即使对于其他应用程序来说更快或更慢的衰减,例如平方根或平方是可能的).
I my case I want to include an age dependent decay (even though for other applications faster or slower decays such as square root or squares could be possible).
例如,假设我有以下数据(为了清楚起见,我建立在以前的数据之上):
For example lets assume I have the following data (I build on the previous data for clarity):
mydf <- data.frame (Year = c(2000, 2001, 2002, 2004, 2005,
2007, 2000, 2001, 2002, 2003,
2003, 2004, 2005, 2006, 2006, 2007),
Name = c("Tom", "Tom", "Tom", "Fred", "Gill",
"Fred", "Gill", "Gill", "Tom", "Tom",
"Fred", "Fred", "Gill", "Fred", "Gill", "Gill"))
# Create an indicator for the experience
mydf$Ind <- 1
# Load require packages
library(data.table)
library(zoo)
# Set data.table
setDT(mydf)
setkey(mydf, Name,Year)
# Perform cartesian join to calculate experience. I2 is the new experience indicator
m <- mydf[CJ(unique(Name),seq(min(Year)-5, max(Year))),allow.cartesian=TRUE][,
list(Ind = unique(Ind), I2 = sum(Ind,na.rm=TRUE)),
keyby=list(Name,Year)]
# This is the approach I have been taking so far. Note that is a simple rolling sum of I2
m[,Exp := rollapply(I2, 5, function(x) sum(head(x,-1)),
align = 'right', fill=0),by=Name]
所以现在的问题是,我如何才能在此计算中包含与年龄相关的衰减.为了对此进行建模,我需要将体验除以体验的年龄,然后再输入总和.
So question now is, how can I include a age dependent decay into this calculation. To model this I need to divide the experience by the age of the experience before it enters the sum.
我一直在尝试使用以下方式使其工作:
I have been trying to get it to work using something along these lines:
m[,Exp_age := rollapply(I2, 5, function(x) sum(head(x,-1)/(tail((Year))-head(Year,-1))),
align = 'right', fill=0),by=Name]
但它不起作用.我认为我的主要问题是我无法正确计算体验的年龄,因此我可以除以总和中的年龄.结果应该类似于 myres
data.frame
下面的 Exp_age
列
But it does not work. I think my main problem is that I cannot get the age of the experience right so I can divide by the age in the sum. The result should look like the Exp_age
column in the myres
data.frame
below
myres <- data.frame(Name = c("Fred", "Fred", "Fred", "Fred", "Fred",
"Gill", "Gill", "Gill", "Gill", "Gill", "Gill",
"Tom", "Tom", "Tom", "Tom", "Tom"),
Year = c(2003, 2004, 2004, 2006, 2007, 2000, 2001, 2005,
2005, 2006, 2007, 2000, 2001, 2002, 2002, 2003),
Ind = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
Exp = c(0, 1, 1, 3, 4, 0, 1, 1, 1, 2, 3, 0, 1, 2, 2, 4),
Exp_age = c(0, 1, 1, 1.333333333, 1.916666667, 0, 1, 0.45,
0.45, 2.2, 2, 0, 1, 1.5, 1.5, 2.833333333))
任何指针将不胜感激!
推荐答案
如果我理解正确,您正在尝试使用 width=5
进行 rollapply
而不是比做一个简单的总和,你想做一个加权总和.权重是相对于 5 年窗口的经验年龄.我会这样做:首先在 data.table
中设置键,使其按 Name
正确递增顺序,然后您知道 中的最后一项x
变量是最年轻的,第一项是最旧的(您已经在代码中执行此操作).我不能完全说出你想要重量的方向(最年轻的体重最大或最年长的人),但你明白了:
If I understand you correctly, you are trying to do a rollapply
with width=5
and rather than do a simple sum, you want to do a weighted sum. The weights are the age of the experience relative to the 5 year window. I would do this: first set the key in your data.table
so that it has proper increasing order by Name
, then you know that the last item in your x
variable is the youngest and the first item is the oldest (you do this in your code already). I can't quite tell which way you want the weights to go (youngest to have greatest weight or oldest) but you get the point:
setkey(m, Name, Year)
my_fun = function(x) { w = 1:length(x); sum(x*w)}
m[,Exp_age:=rollapply(I2, width=5, by=1, fill=NA, FUN=my_fun, by.column=FALSE, align="right") ,by=Name]
这篇关于对具有时间衰减的行求和(rollapply)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!