通过`:`循环中的`:=`赋值(R data.table) [英] Assignment via `:=` in a `for` loop (R data.table)

查看:557
本文介绍了通过`:`循环中的`:=`赋值(R data.table)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在中为循环分配一些新的变量(我试图创建一些具有共同结构的变量,但它们是依赖于子抽样的)。



我试着在我的生活中重新产生这个错误样本数据,我不能。这里的代码工作&获得我想要做的主旨:

  dt <-data.table(id = rep = 20),period = rep(-9:10,100),
grp = rep(sample(4,size = 100,replace = T),each = 20),
y = runif(2000,min = 0,max = 5),key = c(id,period))[,x:= cumsum(y),by = id]
dt2 <-dt 1,100通过= 2),]
DT3<在%SEQ(1,100 -dt [ID%,按= 3)]

为(列表中的DD(DT,DT2,DT3 )){
setkey的(setkey的(DD,GRP)[DD [时间段== 0,和(x)时,由= GRP],x_at_0_by_grp:= V],编号,周期)
}

这很好 - 但是,当我对自己的代码执行此操作时,它会生成无效的。 selfref warning(并且不创建我想要的变量):


code>(setkey的(DT,处理),DT [posting_rel == 0,
总和(CURRENT_BALANCE):无效.internal.selfref检测和
通过采取$ b的副本固定$ b整个表,所以:=可以通过引用添加这个新列。在
更早的点,这个data.table已经被复制(或者已经使用structure()或类似的手动创建
) 。避免键< - ,名字< - 和
attr< - 在R当前(奇怪)可以复制整个数据表。
改为使用set *语法避免复制:?set,?setnames和
?setattr。此外,在R <= v3.0.2中,列表(DT1,DT2)复制了整个DT1和
DT2(用于复制命名对象的R列表)。请升级到
R> v3.0.2如果这是咬人。如果此消息没有帮助,请
向datatable-help报告,以便根本原因可以解决。


事实上,当我将我的数据子集到在合并中需要的那些列,它也适用于我的数据(虽然不保存到原始数据集)。



这表明这是一个键控问题,但我明确设置键的每一步。我完全失去了如何调试这里从这里,因为我不能得到错误重复除了我的完整的数据集。



如果我突破操作(dt,dt2,dt3)中的错误)$ {$($,$)$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $
dummy <-dd [period == 0,sum(x),by = grp]
setkey(dd,grp)
dd [dummy,x_at_0_by_grp:= V1]#*** ERROR HERE ***
setkey(dd,id,period)
}






快速更新 - 如果我使用 lapply 而不是 / code> loop。



任何想法都在这里发生了什么?






UPDATE:我想出了一个解决方法,通过做:

  nnames< -c(dt,dt2 ,dt3)

dt_list< -list(dt,dt2,dt3)

for(ii in 1:3){
dummy& dt_list [[ii]])
dummy [,x_at_0_by_grp:= sum(x [period == 0]),by = grp]
assign(nnames [ii],dummy)
}

还是想了解发生了什么,或许是一种更好的方法

使用20-30条件,将它们保留在列表之外(手动名称为 dt2 等)太笨重,所以我只是假设你有他们所有在 dt_list



我建议只使用您计算的统计资料建立表格,然后 rbind >

  xxt < -  rbindlist(lapply(1:length(dt_list),function(i)
dt_list [[i] [,list(cond = i,xx = sum(x [period == 0])),by = grp]))

创建

  grp cond xx 
1:1 1 623.3448
2 :2 1 784.8438
3:4 1 699.2362
4:3 1 367.7196
5:1 2 323.6268
6:4 2 307.0374
7:2 2 447.0753
8:3 2 185.7377
9:1 3 275.4897
10:4 3 243.0214
11:2 3 149.6041
12:3 3 166.3626

如果你真的想要这些var,你可以很容易地合并回来。例如, dt2

  myi = 2 
setkey(dt_list [[myi]],grp)[xxt [cond == myi,list(grp,xx)]]


b $ b

这不能解决你遇到的错误,但我认为是一个更好的方法。


I'm trying to assign some new variables within a for loop (I'm trying to create some variables with common structure, but which are subsample-dependent).

I've tried for the life of me to re-produce this error on sample data and I can't. Here's code that works & gets the gist of what I want to do:

dt<-data.table(id=rep(1:100,each=20),period=rep(-9:10,100),
               grp=rep(sample(4,size=100,replace=T),each=20),
               y=runif(2000,min=0,max=5),key=c("id","period"))[,x:=cumsum(y),by=id]
dt2<-dt[id %in% seq(1,100,by=2),]
dt3<-dt[id %in% seq(1,100,by=3),]

for (dd in list(dt,dt2,dt3)){
  setkey(setkey(dd,grp)[dd[period==0,sum(x),by=grp],x_at_0_by_grp:=V1],id,period)
}

This works fine--however, when I do this to my own code, it generates the Invalid .internal.selfref warning (and doesn't create the variable I want):

In [.data.table(setkey(dt, treatment), dt[posting_rel == 0, sum(current_balance), : Invalid .internal.selfref detected and fixed by taking a copy of the whole table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or been created manually using structure() or similar). Avoid key<-, names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. Also, in R<=v3.0.2, list(DT1,DT2) copied the entire DT1 and DT2 (R's list() used to copy named objects); please upgrade to R>v3.0.2 if that is biting. If this message doesn't help, please report to datatable-help so the root cause can be fixed.

In fact, when I subset my data to only those columns needed within the merge, it also works fine on my data (though doesn't save to the original data sets).

This suggests to me it's a problem with keying, but I'm explicitly setting the keys every step of the way. I'm completely lost on how to debug this from here because I can't get the error to repeat except on my full data set.

If I break out the operation into steps, the error arises at the merge step:

for (dd in list(dt,dt2,dt3)){
  dummy<-dd[period==0,sum(x),by=grp]
  setkey(dd,grp)
  dd[dummy,x_at_0_by_grp:=V1] #***ERROR HERE***
  setkey(dd,id,period)
}


Quick update--also produces the error if I cast this with lapply instead of within a for loop.

Any ideas what on earth is going on here?


UPDATE: I've come up with a workaround by doing:

nnames<-c("dt","dt2","dt3")

dt_list<-list(dt,dt2,dt3)

for (ii in 1:3){
  dummy<-copy(dt_list[[ii]])
  dummy[,x_at_0_by_grp:=sum(x[period==0]),by=grp]
  assign(nnames[ii],dummy)
}

Would still like to understand what's going on, and perhaps a better way of assigning variables iteratively in situations like this.

解决方案

With 20-30 criteria, keeping them outside of a list (with manual names like dt2, etc.) is too clunky, so I'll just assume you have them all in dt_list.

I suggest making tables with just the stat you're computing, and then rbinding them:

xxt <- rbindlist(lapply(1:length(dt_list),function(i) 
         dt_list[[i]][,list(cond=i,xx=sum(x[period==0])),by=grp]))

which creates

    grp cond       xx
 1:   1    1 623.3448
 2:   2    1 784.8438
 3:   4    1 699.2362
 4:   3    1 367.7196
 5:   1    2 323.6268
 6:   4    2 307.0374
 7:   2    2 447.0753
 8:   3    2 185.7377
 9:   1    3 275.4897
10:   4    3 243.0214
11:   2    3 149.6041
12:   3    3 166.3626

You can easily merge back if you really want those vars. For example, for dt2:

myi = 2
setkey(dt_list[[myi]],grp)[xxt[cond==myi,list(grp,xx)]]

This doesn't resolve the bug you're running into, but I think is a better approach.

这篇关于通过`:`循环中的`:=`赋值(R data.table)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆