对data.table环境错误的函数 [英] Function on data.table environment errors

查看:134
本文介绍了对data.table环境错误的函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以向我解释为什么 bar 不起作用吗?这是 data.table ?中的错误吗?

  .table(radius = 1:10)

foo< -function(Circ){
Circ [,diameter:= 2 * radius]
}
dput x = foo,file ='func.R')
bar< -dget(file ='func.R')

foo(Circles)

bar (圈子)

它与dget函数设置对象的环境有关它返回 .GlobalEnv 以外的其他内容。有一个简单的足够的工作,但它会驱动一个菜鸟像我想要弄清楚为什么它突破了第一。

  MyDGet< -function(file){
temp< -dget(file = file)
环境(temp)< - 。GlobalEnv
return(temp)
}

bar< -MyDGet(file ='func.R')


解决方案

来自 dput


是相关环境被剥离的函数。因此,范围信息可能会丢失。




  parent.env(environment $ b#< environment:namespace:base> 

两个 foo(Circles) bar(Circles)导致 [。data.table ),查看 traceback()

  traceback()
#6:stop(检查is.data.table(DT)== TRUE,否则:=和`:=`(...)参见帮助(\:= \)。)
#5:`:=`(diameter,2 * radius)
#4:`[。 data.frame`(x,i,j)
#3:`[.data.table`(circle,`:=`(diameter,2 * radius))at func.R#3 $ b $ f#3
#1:bar(圈子)

正如你可以看到 [。data.table 调度到 [。data.frame 。这是因为 [。data.table

  if(!cedta()){
#修复#5070(待办)
Nargs = nargs() - (!missing(drop))
ans = if(Nargs < `[.data.frame`(x,i)#drop由DF [i]
忽略else if(missing(drop))`[.data.frame`(x,i,j)
else`[.data.frame`(x,i,j,drop)
#添加is.data.table(ans)检查修复bug#5069
if(! ; is.data.table(ans))setkey(ans,NULL)#参见test 304
return(ans)
}

!cedta() TRUE c $ c> bar()。我们可以通过设置选项(datatable.verbose = TRUE)并重新运行来确认这是一个 cedta 然后我们得到:

 #cedta决定'base'不是data.table意识

那么 cedta()是什么?



假设你使用 data.table 对象,并且使用一个不知道 data.table 数据结构。让我们说这个包有一个叫做 funA 的函数。而你正在调用的函数如下:

  funA(DT)
/ pre>

现在由于程序包不是 data.table感知,它可以使用如下代码:

  funA < -  function(...){
....
tmp<
....
}

这里 DT [,cols] 因为data.table的默认值(默认值为 with = TRUE )中的一些细微差异, 。对于data.table,我们需要 DT [,cols,with = FALSE]



为了让代码正常工作,我们必须确定你在一个函数中使用了一个data.table对象,该函数不知道如何子数据列(或者换句话说,不是数据表感知的)。



我们通过查看函数的父环境,给出你使用的包的命名空间(如果你使用的是包)然后我们检查此包是否导入是否依赖于data.table,或者是否是我们列入白名单的包之一。



这种情况是特殊的(或奇怪的),因为你定义的函数有父环境 base ,而命名空间 base 不是data.table意识。





?dget 将此描述为 R会话(在部分)。 saveRDS 工作正常,您可以将其用作替代(更好)解决方法:

  saveRDS(foo,func.RDS)
bar< -readRDS(func.RDS)
bar(Circles)#works


Can anybody explain to me why bar doesn't work? Is this a bug in data.table?

Circles<-data.table(radius=1:10)

foo<-function(Circ){
  Circ[,diameter:=2*radius]
}
dput(x = foo,file = 'func.R')
bar<-dget(file = 'func.R')

foo(Circles)

bar(Circles)

It has something to do with the fact that the dget function sets the environment of the object it returns to something other than .GlobalEnv. There's an easy enough work around, but it'll drive a rookie like me nuts trying to figure out why it broke in the first place.

MyDGet<-function(file){
  temp<-dget(file=file)
  environment(temp)<-.GlobalEnv
  return(temp)
}

bar<-MyDGet(file = 'func.R')

解决方案

from dput:

If x is a function the associated environment is stripped. Hence scoping information can be lost.

parent.env(environment(bar))
# <environment: namespace:base>

Both foo(Circles) and bar(Circles) result in [.data.table getting dispatched, but in the case bar(), looking at traceback():

traceback()
# 6: stop("Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(\":=\").")
# 5: `:=`(diameter, 2 * radius)
# 4: `[.data.frame`(x, i, j)
# 3: `[.data.table`(Circ, , `:=`(diameter, 2 * radius)) at func.R#3
# 2: Circ[, `:=`(diameter, 2 * radius)] at func.R#3
# 1: bar(Circles)

As you can see [.data.table dispatches to [.data.frame. This happens because of this part within [.data.table:

if (!cedta()) {
  # Fix for #5070 (to do)
  Nargs = nargs() - (!missing(drop))
  ans = if (Nargs<3L) `[.data.frame`(x,i) # drop ignored anyway by DF[i]
  else if (missing(drop)) `[.data.frame`(x,i,j)
  else `[.data.frame`(x,i,j,drop)
  # added is.data.table(ans) check to fix bug #5069
  if (!missing(i) & is.data.table(ans)) setkey(ans,NULL) # See test 304
  return(ans)
}

Here !cedta() is TRUE in case of bar(). We can confirm this is a cedta issue by setting options(datatable.verbose=TRUE) and rerunning. We then get:

# cedta decided 'base' wasn't data.table aware

So what does cedta() do?

Suppose you're using data.table objects, and also using a package that's not aware of data.table data structure. And let's say the package has a function called funA. And you're calling the function as follows:

funA(DT)

Now since the package isn't data.table aware, it could be using code as follows:

funA <- function(...) {
    ....
    tmp <- DT[, cols]
    ....
}

Here DT[, cols] would not work on a data.table properly due to some minor differences in data.table's defaults (by default with = TRUE). And for a data.table, we'd need DT[, cols, with=FALSE].

For your code to work well, we've to identify that you're using a data.table object on a function from a package that doesn't know how to subset columns from a data.table (or in other words, not data.table aware).

And we do that by looking at the parent environment of the function and that gives the namespace of the package you're using (if you're using a package), and then we check if this package imports, or depends on data.table, or if it's one of the packages that we've whitelisted.

This case is special (or strange) because the function you defined has parent environment as base, and the namespace base isn't data.table aware.

Therefore this is not actually a bug.

?dget describes this as not a good way to transfer objects between R sessions (under NOTE section). saveRDS works fine and you can use it as an alternative (better) workaround:

saveRDS(foo, "func.RDS")
bar <-readRDS("func.RDS")
bar(Circles)  # works

这篇关于对data.table环境错误的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆