在 R data.table 中,如何将变量参数传递给表达式? [英] In R data.table, how do I pass variable parameters to an expression?
问题描述
data.table
遇到了一个小 R 问题.非常感谢您的帮助.我该怎么做:
I am stuck with a small R issue with data.table
. Your help is much appreciated. How do I do this:
getResult <- function(dt, expr, gby) {
e <- substitute(expr)
b <- substitute(gby)
return(dt[,eval(e),by=b])
}
v1 <- "Sepal.Length"
v2 <- "Species"
dt <- data.table(iris)
rDT <- getResult(dt, sum(v1, na.rm=TRUE), v2)
我收到以下错误:
sum(v1, na.rm = TRUE) 中的错误:无效的类型"(字符)论据
Error in sum(v1, na.rm = TRUE) : invalid 'type' (character) of argument
现在,v1
和 v2
都作为字符变量从其他程序传递,所以我不能这样做 v1<-quote(Sepal.Length)
这似乎工作.
Now, both v1
and v2
get passed from other program as character variable so I can't do this v1<- quote(Sepal.Length)
which seems to work.
推荐答案
flodel 在评论中的答案的替代方案可能是
An alternative to flodel's answer in the comments could be
e <- parse(text = paste0("sum(", v1, ", na.rm = TRUE)"))
b <- parse(text = v2)
rDT2 <- dt[, eval(e), by = eval(b)]
# b V1
# [1,] setosa 250.3
# [2,] versicolor 296.8
# [3,] virginica 329.4
并将其放入函数中,
getResult <- function(dt, expr, gby){
return(dt[, eval(expr), by = eval(gby)])
}
(dtR <- getResult(dt = dt, expr = e, gby = b))
# gives the same result as above
马修paste0
和 eval
quote
方法在某些情况下也比 get
更快是有一个微妙的原因.分组速度快的原因之一是 data.table
检查 j
以查看它使用了哪些列,然后只对那些使用的列进行子集化(FAQ 1.12 和 3.1).它使用 base::all.vars(j)
来做到这一点.在 j
中使用 get()
时,正在使用的列对 all.vars
和 data.table
隐藏回到子集所有列以防 j
表达式需要它们(很像在 j
中使用 .SD
符号时,为此.SDcols
被添加来解决).如果无论如何都使用了所有列,那么它没有任何区别,但是如果 DT
说 1e7x100 那么分组 j=sum(V1)
应该比一个分组的 j=sum(get("V1"))
出于这个原因.至少,这是应该发生的,如果没有,那么它可能是一个错误.另一方面,如果许多查询是动态构建并重复的,那么 paste0
和 parse
的时间可能会进入其中.一切都取决于.设置 verbose=TRUE
应该打印出一条消息,说明已检测到哪些列被 j
使用,以便可以检查.
EDIT from Matthew:
There's a subtle reason why the paste0
and eval
quote
methods can be faster than get
in some cases, too. One of the reasons grouping can be fast is that data.table
inspects j
to see which columns it uses, then only subsets those used columns (FAQ 1.12 and 3.1). It uses base::all.vars(j)
to do that. When using get()
in j
the column being used is hidden from all.vars
and data.table
falls back to subsetting all the columns just in case the j
expression needs them (much like when the .SD
symbol is used in j
, for which .SDcols
was added to solve). If all the columns are used anyway then it doesn't make a difference, but if DT
is say 1e7x100 then a grouped j=sum(V1)
should be much faster than a grouped j=sum(get("V1"))
for that reason. At least, that's what's supposed to happen, and if it doesn't then it may be a bug. If on the other hand many queries are being constructed dynamically and repeated then the time to paste0
and parse
might come into it. All depends really. Setting verbose=TRUE
should print out a message about which columns have been detected as used by j
, so that can be checked.
这篇关于在 R data.table 中,如何将变量参数传递给表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!