在data.table中使用eval [英] using eval in data.table
问题描述
使用以下data.table:
$ 我尝试将data.table中eval的行为理解为框架 b$ b
set.seed(1)
foo = data.table(var1 = sample(1:3,1000,r = T) var2 = rnorm(1000),var3 = sample(letters [1:5],1000,replace = T))
$ b b
我试图复制此指令
foo [var1 == 1,sum(var2) var b]
$ b b b b b pre> eval1 = function(s)eval(parse(text = s),envir = sys.parent())
正如你所看到的,测试1和3是工作,但我不明白在eval中为test 2设置的正确环境:
var_i =var1
var_j =var2
var_by =var3
$ b b#test 1 works
foo [eval1(var_i)== 1,sum(var2),by = var3]
#test 2不工作
foo [var1 == 1,sum(eval1(var_j)),by = var3]
#test 3 works
foo [var1 == 1,sum(var2),by = eval1(var_by) ]
j-exp
在 .SD
的环境中检查其变量,它代表数据子集
。 .SD
本身是一个 data.table
,它包含该组的列。
执行以下操作时:
foo [var1 == 1,sum (eval(parse(text = var_j))),by = var3]
c $ c> j-exp 获得内部优化/替换为 sum(var2)
。但 sum(eval1(var_j))
没有得到优化,并保持原样。
然后当它对每个组求值时,它必须找到 var2
在调用函数的parent.frame()中,但在 .SD
中。作为示例,让我们这样做:
eval1< - function(s)eval(parse(text = s),envir = parent.frame())
foo [var1 == 1,{var2 = 1L; eval1(var_j)},by = var3]
#var3 V1
#1:e 1
#2:c 1
#3:a 1
#4 :b 1
#5:d 1
找到 var2
从它的父框架。也就是说,我们必须指向正确的环境来评估,有一个额外的参数值= .SD
。
eval1 < - function(s,env)eval(parse(text = s),envir = env,enclos = parent.frame())
foo [var1 == 1,sum(eval1(var_j,.SD)),by = var3]
#var3 V1
#1:e 11.178035
#2:c -12.236446
# 3:a -8.984715
#4:b -2.739386
#5:d -1.159506
I'm trying to understand the behaviour of eval in a data.table as a "frame".
With following data.table:
set.seed(1)
foo = data.table(var1=sample(1:3,1000,r=T), var2=rnorm(1000), var3=sample(letters[1:5],1000,replace = T))
I'm trying to replicate this instruction
foo[var1==1 , sum(var2) , by=var3]
using a function of eval:
eval1 = function(s) eval( parse(text=s) ,envir=sys.parent() )
As you can see, test 1 and 3 are working, but I don't understand which is the "correct" envir to set in eval for test 2:
var_i="var1"
var_j="var2"
var_by="var3"
# test 1 works
foo[eval1(var_i)==1 , sum(var2) , by=var3 ]
# test 2 doesn't work
foo[var1==1 , sum(eval1(var_j)) , by=var3]
# test 3 works
foo[var1==1 , sum(var2) , by=eval1(var_by)]
The j-exp
, checks for it's variables in the environment of .SD
, which stands for Subset of Data
. .SD
is itself a data.table
that holds the columns for that group.
When you do:
foo[var1 == 1, sum(eval(parse(text=var_j))), by=var3]
directly, the j-exp
gets internally optimised/replaced to sum(var2)
. But sum(eval1(var_j))
doesn't get optimised, and stays as it is.
Then when it gets evaluated for each group, it'll have to find var2
, which doesn't exist in the parent.frame() from where the function is called, but in .SD
. As an example, let's do this:
eval1 <- function(s) eval(parse(text=s), envir=parent.frame())
foo[var1 == 1, { var2 = 1L; eval1(var_j) }, by=var3]
# var3 V1
# 1: e 1
# 2: c 1
# 3: a 1
# 4: b 1
# 5: d 1
It find var2
from it's parent frame. That is, we have to point to the right environment to evaluate in, with an additional argument with value = .SD
.
eval1 <- function(s, env) eval(parse(text=s), envir = env, enclos = parent.frame())
foo[var1 == 1, sum(eval1(var_j, .SD)), by=var3]
# var3 V1
# 1: e 11.178035
# 2: c -12.236446
# 3: a -8.984715
# 4: b -2.739386
# 5: d -1.159506
这篇关于在data.table中使用eval的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!