通过引用分配给加载的包数据集 [英] assigning by reference into loaded package datasets
问题描述
我正在创建一个使用 data.table
作为数据集的包,并且有一些函数通过引用使用
我已经构建了一个简单的包来演示我的问题
/ p>
library(devtools)
install_github('foo','mnel')
它包含两个函数
foo< ; function(x){
x [,a:= 1]
}
fooCall< - function(x){
eval(substitute(x [,a: 1]),parent.frame(1))
}
延迟加载) DT
,使用
DT <表(b = 1:5)
save(DT,file ='data / DT.rda')
当我安装这个包,我的理解是 foo(DT)
应该在 DT
。
library(foo)
data(DT)
foo(DT)
ba
1:1 1
2:2 1
3:3 1
4:4 1
5:5 1
#在DT中未通过引用分配
DT
b
1:1
2:2
3:3
4:4
5:5
如果我使用更多正确
tracmem(DT)
DT< - foo(DT)
#无复制工作
DT
ba
1:1 1
2:2 1
3:3 1
4:4 1
5: 5 1
untracemem(DT)
如果我使用 eval
和code>和之间替换
DT)
ba
1:1 1
2:2 1
3:3 1
4:4 1
5:5 1
#它通过引用分配
DT
ba
1:1 1
2:2 1
3:3 1
4:4 1
5:5 1
我应该坚持使用
-
DT <-foo(DT)
或eval
/ <$有没有什么我不明白如何数据
路由,或 DT <- foo(DT)
or theeval
/substitute
route, or- Is there something I'm not understanding about how
data
loads datasets, even when not lazy?
<
这与数据集或锁定没有关系 - 您可以使用
DT <-unserialize(serialize(data.table(b = 1:5) NULL))
foo(DT)
DT
与 data.table
必须在第一次访问 DT
时重新创建对象内的extptr的事实, ,但是它在副本上这样做,所以没有办法在全局环境中与原始版本共享修改。
DT <-unserialize(serialize(data.table(b = 1:3), NULL))
DT
b
1:1
2:2
3:3
DT [,newcol:= 42]
DT # 好。 DT反弹到新的浅拷贝(当直接时)
b newcol
1:1 42
2:2 42
3:3 42
DT < unserialize(serialize(data.table(b = 1:3),NULL))
foo(DT)
ba
1:1 1
2:2 1
3:3 1
DT#但通过function foo()不正确
b
1:1
2:2
3:3
DT <-unserialize(serialize(data.table(b = 1:3),NULL))
alloc.col(DT)#alloc.col第一个
b
1:1
2:2
3:3
foo(DT)
ba
1:1 1
2:2 1
3:3 1
DT#现在确定
ba
1:1 1
2:2 1
3:3 1
或者,不要将 DT
传入函数,只需直接引用它。使用 data.table
像数据库: .GlobalEnv
中的几个固定名称表。
DT < - unserialize(serialize(data.table(b = 1:5),NULL))
foo& {
DT [,newcol:= 7]
}
foo()
b newcol
1:1 7
2:2 7
3:3 7
4:4 7
5:5 7
DT#无序列化数据表现在被过度分配和更新确定。
b newcol
1:1 7
2:2 7
3:3 7
4:4 7
5:5 7
I am in the process of creating a package that uses a data.table
as a dataset and has a couple of functions which assign by reference using :=
.
I have built a simple package to demonstrate my problem
library(devtools)
install_github('foo','mnel')
It contains two functions
foo <- function(x){
x[, a := 1]
}
fooCall <- function(x){
eval(substitute(x[, a :=1]),parent.frame(1))
}
and a dataset (not lazy loaded) DT
, created using
DT <- data.table(b = 1:5)
save(DT, file = 'data/DT.rda')
When I install this package, my understanding is that foo(DT)
should assign by reference within DT
.
library(foo)
data(DT)
foo(DT)
b a
1: 1 1
2: 2 1
3: 3 1
4: 4 1
5: 5 1
# However this has not assigned by reference within `DT`
DT
b
1: 1
2: 2
3: 3
4: 4
5: 5
If I use the more correct
tracmem(DT)
DT <- foo(DT)
# This works without copying
DT
b a
1: 1 1
2: 2 1
3: 3 1
4: 4 1
5: 5 1
untracemem(DT)
If I use eval
and substitute
within the function
fooCall(DT)
b a
1: 1 1
2: 2 1
3: 3 1
4: 4 1
5: 5 1
# it does assign by reference
DT
b a
1: 1 1
2: 2 1
3: 3 1
4: 4 1
5: 5 1
Should I stick with
This has nothing to do with datasets or locking -- you can reproduce it simply using
DT<-unserialize(serialize(data.table(b = 1:5),NULL))
foo(DT)
DT
I suspect it has to do with the fact that data.table
has to re-create the extptr inside the object on the first access on DT
, but it's doing so on a copy so there is no way it can share the modification with the original in the global environment.
[From Matthew] Exactly.
DT<-unserialize(serialize(data.table(b = 1:3),NULL))
DT
b
1: 1
2: 2
3: 3
DT[,newcol:=42]
DT # Ok. DT rebound to new shallow copy (when direct)
b newcol
1: 1 42
2: 2 42
3: 3 42
DT<-unserialize(serialize(data.table(b = 1:3),NULL))
foo(DT)
b a
1: 1 1
2: 2 1
3: 3 1
DT # but not ok when via function foo()
b
1: 1
2: 2
3: 3
DT<-unserialize(serialize(data.table(b = 1:3),NULL))
alloc.col(DT) # alloc.col needed first
b
1: 1
2: 2
3: 3
foo(DT)
b a
1: 1 1
2: 2 1
3: 3 1
DT # now it's ok
b a
1: 1 1
2: 2 1
3: 3 1
Or, don't pass DT
into the function, just refer to it directly. Use data.table
like a database: a few fixed name tables in .GlobalEnv
.
DT <- unserialize(serialize(data.table(b = 1:5),NULL))
foo <- function() {
DT[, newcol := 7]
}
foo()
b newcol
1: 1 7
2: 2 7
3: 3 7
4: 4 7
5: 5 7
DT # Unserialized data.table now over-allocated and updated ok.
b newcol
1: 1 7
2: 2 7
3: 3 7
4: 4 7
5: 5 7
这篇关于通过引用分配给加载的包数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!