如果在函数中定义,则使用data.table dplyr时找不到R对象 [英] R object not found if defined within a function when using data.table dplyr

查看:300
本文介绍了如果在函数中定义,则使用data.table dplyr时找不到R对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

注意所描述的行为已在dplyr的开发版本中修复。您可以使用devtools :: install_github(hadley / dplyr)安装dplyr



请参阅这个最小示例;我使用dplyr v0.3.0.2和data.table v1.9.4

 库(dplyr)
库data.table)
f< - function(x,y,bad){
z< - data.table(x,y,key =x)
z2& %>%group_by(x)%>%summarize(sum.bad = sum(y == bad))
z2
}

f(rnorm rnorm(100)<0,bad = FALSE)

p>

 在`[.data.table`(dt,,list(sum.bad = sum(y == bad) by = vars):
object'bad'not found

在适用范围。



如果我只是在函数之外运行它,它会起作用。

  x<  -  rnorm(100)
y -rnorm(100)< 0
bad< - FALSE
z< - data.table(x,y,key =x )

z2< - z%>%group_by(x)%>%summarize(sum.bad = sum(y == bad))
z2

这里有什么问题?是否是data.table或dplyr的错误?

解决方案

看起来这是一个问题 dplyr 正在设置data.table调用的环境。该问题出现在 dplyr ::: summarise_.grouped_dt 函数中。目前看起来像

 函数(.data,...,.dots)
{
< - lazyeval :: all_dots(.dots,...,all_named = TRUE)
for(i in seq_along(dots)){
if(identical(dots [[i]] $ expr, quote(n()))){
dots [[i]] $ expr < - quote(.N)
}
}
list_call < - lazyeval :: make_call(quote(list),dots)
call< - substitute(dt [,list_call,by = vars],list(list_call = list_call $ expr))
env < - dt_env ,parent.frame())
out< - eval(call,env)
grouped_dt(out,drop_last(groups(.data)),copy = FALSE)
} $ b b< environment:namespace:dplyr>

如果我们调试该函数并在调用时查看跟踪,

 其中1:summarise_.grouped_dt(.data,.dots = lazyeval :: lazy_dots(...))
其中2:其中3:summarize(。,sum.bad = sum(y == bad))
其中4:function_list [...] [$]
其中5:withVisible(function_list [[k]](value))
其中6:freduce(value,`_function_list`)
其中7:`_fseq `(`_lhs`)
其中8:eval(expr,envir,enclosed)
其中9:eval(quote(`_fseq`(`_lhs`)),env,env)
其中10:withVisible(eval(quote(`_fseq`(`_lhs`)),env,env))
其中11在#3:z%>%group_by(x)%>%summarize .bad = sum(y == bad))
其中12:f(rnorm(100),rnorm(100)< 0,bad = FALSE)
pre>

所以重要的一行是

  env < dt_env(.data,parent.frame())

这里它设置环境路径,指定在哪里查找调用中的所有变量。这里它只是使用parent.frame看起来函数被调用的地方,但因为你实际上跳过几个圈从 summarize 调用到这个函数里面 f(),这似乎不是正确的父框架。

  env<  -  dt_env(.data,parent.frame(2))

在调试模式下,似乎实际获得正确的父框架。所以我认为问题是从 summarize()跳转到 summarize _(),因为这

  ff < -  function(x,y,bad){
z< - data.table(x,y,key =x )
z2< - z%>%group_by(x)%>%summarize _(。dots = list(sum.bad = quote(sum(y == bad))))
z2
}

ff(rnorm(100),rnorm(100)< 0,bad = FALSE)

似乎工作。所以它真的dplyr,需要设置正确的环境。棘手的部分是,如果你直接调用 summarize summarize _ ,看起来是不同的。也许 summarize()可以改变环境,当它调用 summarize _ 以具有相同的parent.frame通过 eval()。但我可能会把这个作为一个错误报告,让Hadley决定如何解决它。像

  summarize<  -  function(.data,...){
call< - match。 call()
call< - as.call(c(as.list(call)[1:2],list(.dots = as.list(call)[ - (1:2)]) )
call [[1]] < - quote(summarise_)
eval(call,envir = parent.frame())
}

将是一种传统的方式。



测试使用 data.table_1.9.2 dplyr_0.3.0.2


Note The described behaviour has been fixed in the dev version of dplyr. You can install dplyr using devtools::install_github("hadley/dplyr")

Please see this minimal example; I am using dplyr v0.3.0.2 and data.table v1.9.4

library(dplyr)
library(data.table)
f <- function(x, y, bad) { 
  z <- data.table(x,y, key = "x")    
  z2 <- z %>% group_by(x) %>% summarise(sum.bad = sum(y == bad))
  z2
}

f(rnorm(100), rnorm(100) < 0, bad = FALSE) 

When I run the above I get

Error in `[.data.table`(dt, , list(sum.bad = sum(y == bad)), by = vars) : 
  object 'bad' not found

However bad is clearly defined and in scope.

If I just run this outside of a function it works

  x <- rnorm(100)
  y <- rnorm(100) <0
  bad <- FALSE
  z <- data.table(x,y, key = "x")

  z2 <- z %>% group_by(x) %>% summarise(sum.bad = sum(y == bad))
  z2

What is the issue here? Is it a bug with either data.table or dplyr?

解决方案

Seems like this is a problem with how dplyr is setting up the environment to the data.table call. The problem appears in the dplyr:::summarise_.grouped_dt function. It currently looks like

function (.data, ..., .dots) 
{
    dots <- lazyeval::all_dots(.dots, ..., all_named = TRUE)
    for (i in seq_along(dots)) {
        if (identical(dots[[i]]$expr, quote(n()))) {
            dots[[i]]$expr <- quote(.N)
        }
    }
    list_call <- lazyeval::make_call(quote(list), dots)
    call <- substitute(dt[, list_call, by = vars], list(list_call = list_call$expr))
    env <- dt_env(.data, parent.frame())
    out <- eval(call, env)
    grouped_dt(out, drop_last(groups(.data)), copy = FALSE)
}
<environment: namespace:dplyr>

and if we debug that function and look at the trace when it's called, we see

where 1: summarise_.grouped_dt(.data, .dots = lazyeval::lazy_dots(...))
where 2: summarise_(.data, .dots = lazyeval::lazy_dots(...))
where 3: summarise(., sum.bad = sum(y == bad))
where 4: function_list[[k]](value)
where 5: withVisible(function_list[[k]](value))
where 6: freduce(value, `_function_list`)
where 7: `_fseq`(`_lhs`)
where 8: eval(expr, envir, enclos)
where 9: eval(quote(`_fseq`(`_lhs`)), env, env)
where 10: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
where 11 at #3: z %>% group_by(x) %>% summarise(sum.bad = sum(y == bad))
where 12: f(rnorm(100), rnorm(100) < 0, bad = FALSE)

So the important line is the

env <- dt_env(.data, parent.frame())

one. Here it's setting up the environment path which specifies where to look up all variables in the call. Here it's just using the parent.frame which is looks to where the function was called from, but since you actually jump through a few hoops to get to this function from your summarize call inside f(), this doesn't seem to be the right parent frame. If, instead you run

env <- dt_env(.data, parent.frame(2))

in debug mode, that seems to actually get at the correct parent frame. So i think the problem is the jump from summarize() to summarize_() because this

ff <- function(x, y, bad) { 
  z <- data.table(x,y, key = "x")    
  z2 <- z %>% group_by(x) %>% summarise_(.dots=list(sum.bad = quote(sum(y == bad))))
  z2
}

ff(rnorm(100), rnorm(100) < 0, bad = FALSE) 

seems to work. So it's really dplyr that needs to set up the correct environment. The tricky part is that appears to be different if you call summarize or summarize_ directly. Perhaps summarise() could change the environment when it calls summarise_ to have the same parent.frame via eval(). But I'd probably file this as a bug report and let Hadley decide how to fix it. Something like

summarise <- function(.data, ...) {
  call <- match.call()
  call <- as.call(c(as.list(call)[1:2], list(.dots=as.list(call)[-(1:2)])))
  call[[1]] <- quote(summarise_)
  eval(call, envir=parent.frame())
}

would be a "traditional" way to do it. Not sure if the lazyeval package has nicer ways to do this or not.

Tested with data.table_1.9.2 and dplyr_0.3.0.2

这篇关于如果在函数中定义,则使用data.table dplyr时找不到R对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆