在r中的数据表中迭代定义变量 [英] Define variable iteratively in data table in r

查看:208
本文介绍了在r中的数据表中迭代定义变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找到一个更快的解决方案来迭代地定义一个变量,即变量的下一行取决于前一行。例如,假设我有以下的data.table:

  tmp<  -  data.table(type = c A,A,B,B,B),
year = c(2011,2012,2013,2011,2012,2013) c(1,1,1,2,2,2),
beta = c(3,3,3,4,4,4),
pred = c(1,NA,NA, 2,NA,NA))

对于每种类型(A和B)对于前进,其中对于2012年的类型A的pred是:

  pred_2012_A = alpha + beta * pred_2011_A 

,并且2013年的类型A的pred继续:

  pred_2013_A = alpha + beta * pred_2012_A 

使用for循环来遍历类型并创建一个变量来存储以前的值,并使用数据表中的by命令循环遍历年份:

  for(i in c(A,B)){
tmp.val< - tmp [type == i& year == 2011] $ pred#类型的初始值i
tmp [year> 2011& type == i,pred:= {
tmp.val< - alpha + beta * tmp.val
},by = year]
}



最终,原始数据表如下:

  type year alpha beta pred 
1:A 2011 1 3 1
2:A 2012 1 3 NA
3:A 2013 1 3 NA
4:B 2011 2 4 2
5:B 2012 2 4 NA
6:B 2013 2 4 NA

更新的表格如下:

 类型年度Alpha测试版本
1:A 2011 1 3 1
2:A 2012 1 3 4
3:A 2013 1 3 13
4:B 2011 2 4 2
5:B 2012 2 4 10
6: B 2013 2 4 42

我的问题是,如果有更快的方式来实现这个没有循环。有没有办法在一个数据表语句中实现这个例程比使用for循环更快?



谢谢。

你可以做数学:

  tmp [, pred:= pred [1] * beta ^(1:.N-1)+ alpha * cumsum(c(0,beta [1] ^(0:。N-2))) b 
$ b#type year alpha beta pred
#1:A 2011 1 3 1
#2:A 2012 1 3 4
#3:A 2013 1 3 13
#4:B 2011 2 4 2
#5:B 2012 2 4 10
#6:B 2013 2 4 42






评论。在我看来,OP中的数据结构有缺陷。 Alpha和beta显然是类型的属性,不是逐行变化的。它应该以:

  typeDT = data.table(
type = c(A,B ),
year.start = 2011L,
year.end = 2013,
a = 1:2,
b = 3:4,
pred0 = 1:2


#type year.start year.end ab pred0
#1:A 2011 2013 1 3 1
#2:B 2011 2013 2 4 2

使用此结构,您可以自然扩展到您的数据集:

  typeDT [,{
year = year.start:year.end
n = length(year)
p = pred0 * b ^ 0:(n-1))+ a * cumsum(c(0,b ^(0:(n-2)))
(year = year,pred = p)
} by = type]

#type year pred
#1:A 2011 1
#2:A 2012 4
#3:A 2013 13
#4:B 2011 2
#5:B 2012 10
#6:B 2013 42


I am trying to find a faster solution to defining a variable iteratively, i.e., the next row of the variable depends on the previous row. For example, suppose I have the following data.table:

tmp <- data.table(type = c("A", "A", "A", "B", "B", "B"), 
                  year = c(2011, 2012, 2013, 2011, 2012, 2013), 
                  alpha = c(1,1,1,2,2,2), 
                  beta = c(3,3,3,4,4,4), 
                  pred = c(1,NA,NA,2,NA, NA))

For each type (A and then B), I want to solve for pred going forward, where pred for type A for the year 2012 is:

pred_2012_A = alpha + beta * pred_2011_A

and the pred for 2013 for type A continues:

pred_2013_A = alpha + beta * pred_2012_A

I have a solution using a for loop to go through type and create a variable to store the previous value and use the "by" command in data table to loop through the year as such:

for(i in c("A", "B")){
  tmp.val <- tmp[type == i & year == 2011]$pred # initial value for type i
  tmp[year > 2011 & type == i, pred := {
    tmp.val <- alpha + beta * tmp.val
  }, by = year]
}

Ultimately, the original data table looks like:

   type year alpha beta pred
1:    A 2011     1    3    1
2:    A 2012     1    3   NA
3:    A 2013     1    3   NA
4:    B 2011     2    4    2
5:    B 2012     2    4   NA
6:    B 2013     2    4   NA

And the updated table looks like:

   type year alpha beta pred
1:    A 2011     1    3    1
2:    A 2012     1    3    4
3:    A 2013     1    3   13
4:    B 2011     2    4    2
5:    B 2012     2    4   10
6:    B 2013     2    4   42

My question here is if there is a faster way to implement this without the for loop. Is there a way to implement this routine in one data table statement that is faster than using the for loop? My real usage has many more types and many more years to compute, so a faster implementation would be greatly appreciated.

Thank you.

解决方案

You can just do the math:

tmp[, pred := pred[1]*beta^(1:.N-1) + alpha*cumsum(c(0, beta[1]^(0:(.N-2)))), by=type]

#    type year alpha beta pred
# 1:    A 2011     1    3    1
# 2:    A 2012     1    3    4
# 3:    A 2013     1    3   13
# 4:    B 2011     2    4    2
# 5:    B 2012     2    4   10
# 6:    B 2013     2    4   42


Comment. In my opinion, the data structure in the OP is flawed. Alpha and beta are clearly attributes of the type, not something that is varying from row to row. It should start with:

typeDT = data.table(
  type=c("A","B"), 
  year.start = 2011L, 
  year.end=2013, 
  a = 1:2, 
  b = 3:4,
  pred0 = 1:2
)

#    type year.start year.end a b pred0
# 1:    A       2011     2013 1 3     1
# 2:    B       2011     2013 2 4     2

With this structure, you could expand to your data set naturally:

typeDT[, {
  year = year.start:year.end
  n    = length(year)
  p    = pred0*b^(0:(n-1)) + a*cumsum(c(0, b^(0:(n-2))))
  .(year = year, pred = p)
}, by=type]

#    type year pred
# 1:    A 2011    1
# 2:    A 2012    4
# 3:    A 2013   13
# 4:    B 2011    2
# 5:    B 2012   10
# 6:    B 2013   42

这篇关于在r中的数据表中迭代定义变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆