如何在循环调用中使用 data.table 生成变量的线性组合和更新表? [英] How to generate a linear combination of variables and update table using data.table in a loop call?
问题描述
set.seed(123)
df <- data.frame(what_ever = rnorm(5, 50, 1),
this_is = rnorm(5, 30, 1),
wtf_nnn = rnorm(5, 20, 1),
hat_ever = rnorm(5, 50, 1),
who_is = rnorm(5, 30, 1),
mmm_nnn = rnorm(5, 20, 1)
)
library(data.table)
DT <- data.table(df)
str(DT)
Classes ‘data.table’ and 'data.frame': 5 obs. of 6 variables:
如何在 data.table
中生成新变量这是以下使用循环的结果?
How can I generate new variables in the data.table
that are the result of the following using a loop?
New_Var_1 = what_ever/hat_ever
New_Var_2 = this_is/who_is
New_Var_3 = wtf_nnn/mmm_nnn
我在这里对列名进行排序
nm <- names(df)
nm1 <- nm[1:3]
nm2 <- nm[4:6]
我想以这种方式更新DT,并且循环通过t
i <- 1
New_Var_names <- paste("New_Var_", i, sep = "")
New_Var <- sprintf("%s/%s", nm1[i], nm2[i])
3 次尝试均无效.
DT[,New_Var_names := New_Var]
DT[,cat(New_Var_names) := cat(New_Var)]
DT[,eval(New_Var_names) := eval(New_Var)]
推荐答案
我建议使用带有 for-loop
的 set
来执行此操作,但在当前稳定 (CRAN) 版本 1.8.10,set
不添加新列.所以,我会做这样的事情:
I'd recommend to use set
with a for-loop
to do this, but on the current stable (CRAN) version 1.8.10, set
doesn't add new columns. So, I'd do something like:
require(data.table)
out_names <- paste("newvar", 1:3, sep="_")
DT[, c(out_names) := 0]
invar1 <- names(DT)[1:3]
invar2 <- names(DT)[4:6]
for (i in seq_along(invar1)) {
set(DT, i=NULL, j=out_names[i], value=DT[[invar1[i]]]/DT[[invar2[i]]])
}
<小时>
在当前的开发版本(1.8.11)中,set
可以添加新的列.因此,您不需要使用 :=
进行分配.那就是:
In the current devel version (1.8.11), set
can add new columns. So in that, you don't need the assignment using :=
. That is:
require(data.table)
out_names <- paste("newvar", 1:3, sep="_")
invar1 <- names(DT)[1:3]
invar2 <- names(DT)[4:6]
for (i in seq_along(invar1)) {
set(DT, i=NULL, j=out_names[i], value=DT[[invar1[i]]]/DT[[invar2[i]]])
}
<小时>
为了完整性,另一种方法是:
For completeness, another way is :
EVAL = function(...)eval(parse(text=paste0(...))) # helper function
New_Var_names <- paste("New_Var_", i, sep = "")
New_Var <- sprintf("%s/%s", nm1[i], nm2[i])
for (i in 1:3)
EVAL("DT[,", New_Var_names[i], ":=", New_Var[i], "]")
这更通用,因为您还可以更改 sprintf
中的运算符 /
并更改 by=
子句等.如果有帮助的话,它类似于构造动态 SQL 语句.如果要记录正在执行的动态查询,可以在 EVAL
的定义中添加 cat
.
This is more general in that you can also vary the operator /
in the sprintf
and vary the by=
clause too, etc. It's similar to constructing a dynamic SQL statement, if that helps. If you wanted to log the dynamic query being executed, you could add a cat
in your definition of EVAL
.
这篇关于如何在循环调用中使用 data.table 生成变量的线性组合和更新表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!