在for循环中通过`:=`赋值(R data.table) [英] Assignment via `:=` in a for loop (R data.table)

查看:149
本文介绍了在for循环中通过`:=`赋值(R data.table)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在 for 循环内分配一些新变量(我试图创建一些具有共同结构的变量,但这些变量取决于子样本)。

I'm trying to assign some new variables within a for loop (I'm trying to create some variables with common structure, but which are subsample-dependent).

我一直以来都在尝试在示例数据上重现此错误,但我做不到。这是有效的代码&得到我想做的要点:

I've tried for the life of me to re-produce this error on sample data and I can't. Here's code that works & gets the gist of what I want to do:

DT <- data.table(
  id = rep(1:100, each = 20L),
  period = rep(-9:10, 100L),
  grp = rep(sample(4L, size = 100L, replace = TRUE), each = 20L),
  y = runif(2000, min=0, max=5), key = c("id", "period")
)
DT[ , x := cumsum(y), by = id]
DT2 <- DT[id %in% seq(1, 100, by=2)]
DT3 <- DT[id %in% seq(1, 100, by=3)]

for (dd in list(DT, DT2, DT3)){
  setkey(setkey(dd, grp)[dd[period==0, sum(x), by = grp], x_at_0_by_grp := V1], id, period)
}

这很好用-但是,当我对自己的代码执行此操作时,它会生成无效的 .internal.selfref 警告(并且不会t创建我想要的变量):

This works fine--however, when I do this to my own code, it generates the Invalid .internal.selfref warning (and doesn't create the variable I want):


[。data.table ( setkey(dt,treatment),dt [posting_rel == 0,
sum(current_balance),:检测到无效的.internal.selfref并用t固定

整个表复制一份,以便:=可以通过引用添加此新列。在较早的
处,此data.table已由R复制(或使用structure()或类似方法手动创建
)。避免使用key <-,names--和
attr<-在R中当前(奇怪地)可以复制整个data.table。
使用set *语法来避免复制:?set,?setnames和
?setattr。另外,在R< = v3.0.2中,list(DT1,DT2)复制了整个DT1和
DT2(R的list()用于复制命名对象);如果有问题,请升级到
R> v3.0.2。如果此消息无济于事,请
向datatable-help报告,以便可以解决根本原因。

In [.data.table(setkey(dt, treatment), dt[posting_rel == 0, sum(current_balance), : Invalid .internal.selfref detected and fixed by taking a copy of the whole table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or been created manually using structure() or similar). Avoid key<-, names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. Also, in R<=v3.0.2, list(DT1,DT2) copied the entire DT1 and DT2 (R's list() used to copy named objects); please upgrade to R>v3.0.2 if that is biting. If this message doesn't help, please report to datatable-help so the root cause can be fixed.

在实际上,当我将我的数据仅子集化为合并中所需的那些列时,它也可以很好地处理我的数据(尽管不会保存到原始数据集中)。

In fact, when I subset my data to only those columns needed within the merge, it also works fine on my data (though doesn't save to the original data sets).

这对我来说是一个键控问题,但是我在每个步骤中都明确设置了键。我完全不知道如何从此处进行调试,因为除了完整的数据集之外,我无法得到重复的错误。

This suggests to me it's a problem with keying, but I'm explicitly setting the keys every step of the way. I'm completely lost on how to debug this from here because I can't get the error to repeat except on my full data set.

如果我中断了操作分步执行,合并步骤就会出现错误:

If I break out the operation into steps, the error arises at the merge step:

for (dd in list(DT, DT2, DT3)){
  dummy <- dd[period==0, sum(x), by = grp]
  setkey(dd, grp)
  dd[dummy, x_at_0_by_grp := V1] #***ERROR HERE***
  setkey(dd, id, period)
}






快速更新-如果我使用 lapply 而不是在内进行强制转换,也会产生错误循环。

有人知道这里到底发生了什么吗?

Any ideas what on earth is going on here?

更新:我提出了一种解决方法:

UPDATE: I've come up with a workaround by doing:

nnames <- c("dt", "dt2", "dt3")

dt_list <- list(DT, DT2, DT3)

for (ii in 1:3){
  dummy <- copy(dt_list[[ii]])
  dummy[ , x_at_0_by_grp := sum(x[period == 0]), by=grp]
  assign(nnames[ii], dummy)
}

仍然想了解正在发生的事情,也许是一种更好的迭代分配变量的方法在这种情况下。

Would still like to understand what's going on, and perhaps a better way of assigning variables iteratively in situations like this.

推荐答案

使用20到30条条件,将其排除在列表之外(手册名称如 dt2 等),因此我假设您在 dt_list 中都包含了它们。

With 20-30 criteria, keeping them outside of a list (with manual names like dt2, etc.) is too clunky, so I'll just assume you have them all in dt_list.

我建议仅使用您要计算的统计信息制作表,然后 rbind 进行表制作:

I suggest making tables with just the stat you're computing, and then rbinding them:

xxt <- rbindlist(lapply(1:length(dt_list),function(i) 
         dt_list[[i]][,list(cond=i,xx=sum(x[period==0])),by=grp]))

将创建

    grp cond       xx
 1:   1    1 623.3448
 2:   2    1 784.8438
 3:   4    1 699.2362
 4:   3    1 367.7196
 5:   1    2 323.6268
 6:   4    2 307.0374
 7:   2    2 447.0753
 8:   3    2 185.7377
 9:   1    3 275.4897
10:   4    3 243.0214
11:   2    3 149.6041
12:   3    3 166.3626

如果您确实想要这些变量,则可以轻松合并。例如,对于 dt2

You can easily merge back if you really want those vars. For example, for dt2:

myi = 2
setkey(dt_list[[myi]],grp)[xxt[cond==myi,list(grp,xx)]]

这不能解决您遇到的错误,但我认为这是一种更好的方法。

This doesn't resolve the bug you're running into, but I think is a better approach.

这篇关于在for循环中通过`:=`赋值(R data.table)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆