在 R data.table 中,如何将变量参数传递给表达式? [英] In R data.table, how do I pass variable parameters to an expression?

查看:22
本文介绍了在 R data.table 中,如何将变量参数传递给表达式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

data.table 遇到了一个小 R 问题.非常感谢您的帮助.我该怎么做:

I am stuck with a small R issue with data.table. Your help is much appreciated. How do I do this:

getResult <- function(dt, expr, gby) {
  e <- substitute(expr)
  b <- substitute(gby)
  return(dt[,eval(e),by=b])
}

v1 <- "Sepal.Length"
v2 <- "Species"

dt <- data.table(iris)
rDT <- getResult(dt, sum(v1, na.rm=TRUE), v2)

我收到以下错误:

sum(v1, na.rm = TRUE) 中的错误:无效的类型"(字符)论据

Error in sum(v1, na.rm = TRUE) : invalid 'type' (character) of argument

现在,v1v2 都作为字符变量从其他程序传递,所以我不能这样做 v1<-quote(Sepal.Length) 这似乎工作.

Now, both v1 and v2 get passed from other program as character variable so I can't do this v1<- quote(Sepal.Length) which seems to work.

推荐答案

flodel 在评论中的答案的替代方案可能是

An alternative to flodel's answer in the comments could be

e <- parse(text = paste0("sum(", v1, ", na.rm = TRUE)"))

b <- parse(text = v2)

rDT2 <- dt[, eval(e), by = eval(b)]

#               b    V1
# [1,]     setosa 250.3
# [2,] versicolor 296.8
# [3,]  virginica 329.4

并将其放入函数中,

getResult <- function(dt, expr, gby){
  return(dt[, eval(expr), by = eval(gby)])
}

(dtR <- getResult(dt = dt, expr = e, gby = b))
# gives the same result as above


马修paste0eval quote 方法在某些情况下也比 get 更快是有一个微妙的原因.分组速度快的原因之一是 data.table 检查 j 以查看它使用了哪些列,然后只对那些使用的列进行子集化(FAQ 1.12 和 3.1).它使用 base::all.vars(j) 来做到这一点.在 j 中使用 get() 时,正在使用的列对 all.varsdata.table 隐藏回到子集所有列以防 j 表达式需要它们(很像在 j 中使用 .SD 符号时,为此.SDcols 被添加来解决).如果无论如何都使用了所有列,那么它没有任何区别,但是如果 DT 说 1e7x100 那么分组 j=sum(V1) 应该比一个分组的 j=sum(get("V1")) 出于这个原因.至少,这是应该发生的,如果没有,那么它可能是一个错误.另一方面,如果许多查询是动态构建并重复的,那么 paste0parse 的时间可能会进入其中.一切都取决于.设置 verbose=TRUE 应该打印出一条消息,说明已检测到哪些列被 j 使用,以便可以检查.


EDIT from Matthew: There's a subtle reason why the paste0 and eval quote methods can be faster than get in some cases, too. One of the reasons grouping can be fast is that data.table inspects j to see which columns it uses, then only subsets those used columns (FAQ 1.12 and 3.1). It uses base::all.vars(j) to do that. When using get() in j the column being used is hidden from all.vars and data.table falls back to subsetting all the columns just in case the j expression needs them (much like when the .SD symbol is used in j, for which .SDcols was added to solve). If all the columns are used anyway then it doesn't make a difference, but if DT is say 1e7x100 then a grouped j=sum(V1) should be much faster than a grouped j=sum(get("V1")) for that reason. At least, that's what's supposed to happen, and if it doesn't then it may be a bug. If on the other hand many queries are being constructed dynamically and repeated then the time to paste0 and parse might come into it. All depends really. Setting verbose=TRUE should print out a message about which columns have been detected as used by j, so that can be checked.

这篇关于在 R data.table 中,如何将变量参数传递给表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆