如果在函数中定义,则使用data.table dplyr时找不到R对象 [英] R object not found if defined within a function when using data.table dplyr
问题描述
注意所描述的行为已在dplyr的开发版本中修复。您可以使用devtools :: install_github(hadley / dplyr)安装dplyr
请参阅这个最小示例;我使用dplyr v0.3.0.2和data.table v1.9.4
库(dplyr)
库data.table)
f< - function(x,y,bad){
z< - data.table(x,y,key =x)
z2& %>%group_by(x)%>%summarize(sum.bad = sum(y == bad))
z2
}
f(rnorm rnorm(100)<0,bad = FALSE)
p>
在`[.data.table`(dt,,list(sum.bad = sum(y == bad) by = vars):
object'bad'not found
在适用范围。
如果我只是在函数之外运行它,它会起作用。
x< - rnorm(100)
y -rnorm(100)< 0
bad< - FALSE
z< - data.table(x,y,key =x )
z2< - z%>%group_by(x)%>%summarize(sum.bad = sum(y == bad))
z2
这里有什么问题?是否是data.table或dplyr的错误?
看起来这是一个问题 dplyr
正在设置data.table调用的环境。该问题出现在 dplyr ::: summarise_.grouped_dt
函数中。目前看起来像
函数(.data,...,.dots)
{
< - lazyeval :: all_dots(.dots,...,all_named = TRUE)
for(i in seq_along(dots)){
if(identical(dots [[i]] $ expr, quote(n()))){
dots [[i]] $ expr < - quote(.N)
}
}
list_call < - lazyeval :: make_call(quote(list),dots)
call< - substitute(dt [,list_call,by = vars],list(list_call = list_call $ expr))
env < - dt_env ,parent.frame())
out< - eval(call,env)
grouped_dt(out,drop_last(groups(.data)),copy = FALSE)
} $ b b< environment:namespace:dplyr>
如果我们调试该函数并在调用时查看跟踪,
其中1:summarise_.grouped_dt(.data,.dots = lazyeval :: lazy_dots(...))
pre>
其中2:其中3:summarize(。,sum.bad = sum(y == bad))
其中4:function_list [...] [$]
其中5:withVisible(function_list [[k]](value))
其中6:freduce(value,`_function_list`)
其中7:`_fseq `(`_lhs`)
其中8:eval(expr,envir,enclosed)
其中9:eval(quote(`_fseq`(`_lhs`)),env,env)
其中10:withVisible(eval(quote(`_fseq`(`_lhs`)),env,env))
其中11在#3:z%>%group_by(x)%>%summarize .bad = sum(y == bad))
其中12:f(rnorm(100),rnorm(100)< 0,bad = FALSE)
所以重要的一行是
env < dt_env(.data,parent.frame())
这里它设置环境路径,指定在哪里查找调用中的所有变量。这里它只是使用parent.frame看起来函数被调用的地方,但因为你实际上跳过几个圈从
summarize
调用到这个函数里面f()
,这似乎不是正确的父框架。env< - dt_env(.data,parent.frame(2))
在调试模式下,似乎实际获得正确的父框架。所以我认为问题是从
summarize()
跳转到summarize _()
,因为这ff < - function(x,y,bad){
z< - data.table(x,y,key =x )
z2< - z%>%group_by(x)%>%summarize _(。dots = list(sum.bad = quote(sum(y == bad))))
z2
}
ff(rnorm(100),rnorm(100)< 0,bad = FALSE)
似乎工作。所以它真的dplyr,需要设置正确的环境。棘手的部分是,如果你直接调用
summarize
或summarize _
,看起来是不同的。也许summarize()
可以改变环境,当它调用summarize _
以具有相同的parent.frame通过eval()
。但我可能会把这个作为一个错误报告,让Hadley决定如何解决它。像summarize< - function(.data,...){
call< - match。 call()
call< - as.call(c(as.list(call)[1:2],list(.dots = as.list(call)[ - (1:2)]) )
call [[1]] < - quote(summarise_)
eval(call,envir = parent.frame())
}
将是一种传统的方式。
测试使用
data.table_1.9.2
和dplyr_0.3.0.2
Note The described behaviour has been fixed in the dev version of dplyr. You can install dplyr using devtools::install_github("hadley/dplyr")
Please see this minimal example; I am using dplyr v0.3.0.2 and data.table v1.9.4
library(dplyr) library(data.table) f <- function(x, y, bad) { z <- data.table(x,y, key = "x") z2 <- z %>% group_by(x) %>% summarise(sum.bad = sum(y == bad)) z2 } f(rnorm(100), rnorm(100) < 0, bad = FALSE)
When I run the above I get
Error in `[.data.table`(dt, , list(sum.bad = sum(y == bad)), by = vars) : object 'bad' not found
However bad is clearly defined and in scope.
If I just run this outside of a function it works
x <- rnorm(100) y <- rnorm(100) <0 bad <- FALSE z <- data.table(x,y, key = "x") z2 <- z %>% group_by(x) %>% summarise(sum.bad = sum(y == bad)) z2
What is the issue here? Is it a bug with either data.table or dplyr?
解决方案Seems like this is a problem with how
dplyr
is setting up the environment to the data.table call. The problem appears in thedplyr:::summarise_.grouped_dt
function. It currently looks likefunction (.data, ..., .dots) { dots <- lazyeval::all_dots(.dots, ..., all_named = TRUE) for (i in seq_along(dots)) { if (identical(dots[[i]]$expr, quote(n()))) { dots[[i]]$expr <- quote(.N) } } list_call <- lazyeval::make_call(quote(list), dots) call <- substitute(dt[, list_call, by = vars], list(list_call = list_call$expr)) env <- dt_env(.data, parent.frame()) out <- eval(call, env) grouped_dt(out, drop_last(groups(.data)), copy = FALSE) } <environment: namespace:dplyr>
and if we debug that function and look at the trace when it's called, we see
where 1: summarise_.grouped_dt(.data, .dots = lazyeval::lazy_dots(...)) where 2: summarise_(.data, .dots = lazyeval::lazy_dots(...)) where 3: summarise(., sum.bad = sum(y == bad)) where 4: function_list[[k]](value) where 5: withVisible(function_list[[k]](value)) where 6: freduce(value, `_function_list`) where 7: `_fseq`(`_lhs`) where 8: eval(expr, envir, enclos) where 9: eval(quote(`_fseq`(`_lhs`)), env, env) where 10: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) where 11 at #3: z %>% group_by(x) %>% summarise(sum.bad = sum(y == bad)) where 12: f(rnorm(100), rnorm(100) < 0, bad = FALSE)
So the important line is the
env <- dt_env(.data, parent.frame())
one. Here it's setting up the environment path which specifies where to look up all variables in the call. Here it's just using the parent.frame which is looks to where the function was called from, but since you actually jump through a few hoops to get to this function from your
summarize
call insidef()
, this doesn't seem to be the right parent frame. If, instead you runenv <- dt_env(.data, parent.frame(2))
in debug mode, that seems to actually get at the correct parent frame. So i think the problem is the jump from
summarize()
tosummarize_()
because thisff <- function(x, y, bad) { z <- data.table(x,y, key = "x") z2 <- z %>% group_by(x) %>% summarise_(.dots=list(sum.bad = quote(sum(y == bad)))) z2 } ff(rnorm(100), rnorm(100) < 0, bad = FALSE)
seems to work. So it's really dplyr that needs to set up the correct environment. The tricky part is that appears to be different if you call
summarize
orsummarize_
directly. Perhapssummarise()
could change the environment when it callssummarise_
to have the same parent.frame viaeval()
. But I'd probably file this as a bug report and let Hadley decide how to fix it. Something likesummarise <- function(.data, ...) { call <- match.call() call <- as.call(c(as.list(call)[1:2], list(.dots=as.list(call)[-(1:2)]))) call[[1]] <- quote(summarise_) eval(call, envir=parent.frame()) }
would be a "traditional" way to do it. Not sure if the lazyeval package has nicer ways to do this or not.
Tested with
data.table_1.9.2
anddplyr_0.3.0.2
这篇关于如果在函数中定义,则使用data.table dplyr时找不到R对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!