对data.table环境错误的函数 [英] Function on data.table environment errors
问题描述
有人可以向我解释为什么 bar
不起作用吗?这是 data.table
?中的错误吗?
.table(radius = 1:10)
foo< -function(Circ){
Circ [,diameter:= 2 * radius]
}
dput x = foo,file ='func.R')
bar< -dget(file ='func.R')
foo(Circles)
bar (圈子)
它与dget函数设置对象的环境有关它返回 .GlobalEnv
以外的其他内容。有一个简单的足够的工作,但它会驱动一个菜鸟像我想要弄清楚为什么它突破了第一。
MyDGet< -function(file){
temp< -dget(file = file)
环境(temp)< - 。GlobalEnv
return(temp)
}
bar< -MyDGet(file ='func.R')
来自 dput
:
是相关环境被剥离的函数。因此,范围信息可能会丢失。
parent.env(environment $ b#< environment:namespace:base>
两个 foo(Circles)
和 bar(Circles)
导致 [。data.table
),查看 traceback()
:
traceback()
#6:stop(检查is.data.table(DT)== TRUE,否则:=和`:=`(...)参见帮助(\:= \)。)
#5:`:=`(diameter,2 * radius)
#4:`[。 data.frame`(x,i,j)
#3:`[.data.table`(circle,`:=`(diameter,2 * radius))at func.R#3 $ b $ f#3
#1:bar(圈子)
正如你可以看到 [。data.table
调度到 [。data.frame
。这是因为 [。data.table
:
if(!cedta()){
#修复#5070(待办)
Nargs = nargs() - (!missing(drop))
ans = if(Nargs < `[.data.frame`(x,i)#drop由DF [i]
忽略else if(missing(drop))`[.data.frame`(x,i,j)
else`[.data.frame`(x,i,j,drop)
#添加is.data.table(ans)检查修复bug#5069
if(! ; is.data.table(ans))setkey(ans,NULL)#参见test 304
return(ans)
}
在!cedta()
是 TRUE
c $ c> bar()。我们可以通过设置选项(datatable.verbose = TRUE)
并重新运行来确认这是一个 cedta
然后我们得到:
#cedta决定'base'不是data.table意识
那么 cedta()
是什么?
假设你使用 data.table
对象,并且使用一个不知道 data.table
数据结构。让我们说这个包有一个叫做 funA
的函数。而你正在调用的函数如下:
funA(DT)
/ pre>
现在由于程序包不是 data.table感知,它可以使用如下代码:
funA < - function(...){
....
tmp<
....
}
这里
DT [,cols]
因为data.table的默认值(默认值为with = TRUE
)中的一些细微差异, 。对于data.table,我们需要DT [,cols,with = FALSE]
。
为了让代码正常工作,我们必须确定你在一个函数中使用了一个data.table对象,该函数不知道如何子数据列(或者换句话说,不是数据表感知的)。
我们通过查看函数的父环境,给出你使用的包的命名空间(如果你使用的是包)然后我们检查此包是否导入或是否依赖于data.table,或者是否是我们列入白名单的包之一。
这种情况是特殊的(或奇怪的),因为你定义的函数有父环境
base
,而命名空间base
不是data.table意识。
?dget
将此描述为不 R会话(在注
部分)。saveRDS
工作正常,您可以将其用作替代(更好)解决方法:saveRDS(foo,func.RDS)
bar< -readRDS(func.RDS)
bar(Circles)#works
Can anybody explain to me why
bar
doesn't work? Is this a bug indata.table
?Circles<-data.table(radius=1:10) foo<-function(Circ){ Circ[,diameter:=2*radius] } dput(x = foo,file = 'func.R') bar<-dget(file = 'func.R') foo(Circles) bar(Circles)
It has something to do with the fact that the dget function sets the environment of the object it returns to something other than
.GlobalEnv
. There's an easy enough work around, but it'll drive a rookie like me nuts trying to figure out why it broke in the first place.MyDGet<-function(file){ temp<-dget(file=file) environment(temp)<-.GlobalEnv return(temp) } bar<-MyDGet(file = 'func.R')
解决方案from
dput
:If x is a function the associated environment is stripped. Hence scoping information can be lost.
parent.env(environment(bar)) # <environment: namespace:base>
Both
foo(Circles)
andbar(Circles)
result in[.data.table
getting dispatched, but in the casebar()
, looking attraceback()
:traceback() # 6: stop("Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(\":=\").") # 5: `:=`(diameter, 2 * radius) # 4: `[.data.frame`(x, i, j) # 3: `[.data.table`(Circ, , `:=`(diameter, 2 * radius)) at func.R#3 # 2: Circ[, `:=`(diameter, 2 * radius)] at func.R#3 # 1: bar(Circles)
As you can see
[.data.table
dispatches to[.data.frame
. This happens because of this part within[.data.table
:if (!cedta()) { # Fix for #5070 (to do) Nargs = nargs() - (!missing(drop)) ans = if (Nargs<3L) `[.data.frame`(x,i) # drop ignored anyway by DF[i] else if (missing(drop)) `[.data.frame`(x,i,j) else `[.data.frame`(x,i,j,drop) # added is.data.table(ans) check to fix bug #5069 if (!missing(i) & is.data.table(ans)) setkey(ans,NULL) # See test 304 return(ans) }
Here
!cedta()
isTRUE
in case ofbar()
. We can confirm this is acedta
issue by settingoptions(datatable.verbose=TRUE)
and rerunning. We then get:# cedta decided 'base' wasn't data.table aware
So what does
cedta()
do?Suppose you're using
data.table
objects, and also using a package that's not aware ofdata.table
data structure. And let's say the package has a function calledfunA
. And you're calling the function as follows:funA(DT)
Now since the package isn't data.table aware, it could be using code as follows:
funA <- function(...) { .... tmp <- DT[, cols] .... }
Here
DT[, cols]
would not work on a data.table properly due to some minor differences in data.table's defaults (by defaultwith = TRUE
). And for a data.table, we'd needDT[, cols, with=FALSE]
.For your code to work well, we've to identify that you're using a data.table object on a function from a package that doesn't know how to subset columns from a data.table (or in other words, not data.table aware).
And we do that by looking at the parent environment of the function and that gives the namespace of the package you're using (if you're using a package), and then we check if this package imports, or depends on data.table, or if it's one of the packages that we've whitelisted.
This case is special (or strange) because the function you defined has parent environment as
base
, and the namespacebase
isn't data.table aware.Therefore this is not actually a bug.
?dget
describes this as not a good way to transfer objects between R sessions (underNOTE
section).saveRDS
works fine and you can use it as an alternative (better) workaround:saveRDS(foo, "func.RDS") bar <-readRDS("func.RDS") bar(Circles) # works
这篇关于对data.table环境错误的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!