是否已记录R data.table通过引用作为参数传递? [英] Is R data.table documented to pass by reference as argument?
问题描述
检查此玩具代码:
> x <- data.table(a = 1:2)
> foo <- function(z) { z[, b:=3:4] }
> y <- foo(x)
> x[]
a b
1: 1 3
2: 2 4
似乎data.table通过引用传递。这是故意的吗?有记录吗?我确实通读了文档,却找不到关于此行为的提及。
It seems data.table is passed by reference. Is this intentional? Is this documented? I did read through the docs and couldn't find a mention of this behaviour.
我不是不是,它询问R的文档参考语义(在<$ c中$ c>:= , set ***
等)。我问一个data.table完整对象是否应该作为函数参数通过引用传递。
I'm not asking about R's documented reference semantics (in :=
, set***
and some others). I'm asking whether a data.table complete object is supposed to be passed by reference as a function argument.
编辑:按照@Oliver的回答,在这里是一些更奇怪的例子。
Following @Oliver's answer, here are some more curious examples.
> dt<- data.table(a=1:2)
> attr(dt, ".internal.selfref")
<pointer: 0x564776a93e88>
> address(dt)
[1] "0x5647bc0f6c50"
>
> ff<-function(x) { x[, b:=3:4]; print(address(x)); print(attr(dt, ".internal.selfref")) }
> ff(dt)
[1] "0x5647bc0f6c50"
<pointer: 0x564776a93e88>
所以 .internal.selfref
不仅与呼叫者的相同dt复制,地址也是。确实是同一对象。 (我认为)。
So not only is .internal.selfref
identical to the caller's dt copy, so is the address. It really is the same object. (I think).
对于data.frames并非完全如此:
This is not exactly the case for data.frames:
> df<- data.frame(a=1:2)
> address(df)
[1] "0x5647b39d21e8"
> ff<-function(x) { print(address(x)); x$b=3:4; print(address(x)) }
>
> ff(df)
[1] "0x5647b39d21e8"
[1] "0x5647ae24de78"
也许根本问题是常规的data.table操作某种程度上不会触发R的修改时复制语义? p>
Maybe the root issue is that regular data.table operations somehow do not trigger R's copy-on-modify semantics?
推荐答案
我认为您感到惊讶的实际上是R行为,这就是为什么在 data.table
(也许应该仍然如此,因为对于 data.table
而言,影响更为重要)。
I think what you're being surprised about is actually R behavior, which is why it's not specifically documented in data.table
(maybe it should be anyway, as the implications are more important for data.table
).
让您惊讶的是,传递给函数的对象具有相同的地址,但是对于 base
R也是如此:
You were surprised that the object passed to a function had the same address, but this is the same for base
R as well:
x = 1:10
address(x)
# [1] "0x7fb7d4b6c820"
(function(y) {print(address(y))})(x)
# [1] "0x7fb7d4b6c820"
函数环境中已复制的是指针 x
。此外,对于 base
R,父 x
是不可变的:
What's being copied in the function environment is the pointer to x
. Moreover, for base
R, the parent x
is immutable:
foo = function(y) {
print(address(y))
y[1L] = 2L
print(address(y))
}
foo(x)
# [1] "0x7fb7d4b6c820"
# [1] "0x7fb7d4e11d28"
也就是说,只要我们尝试编辑 y
,将进行复制。这与引用计数有关-您可以在此例如此演示文稿
That is, as soon as we try to edit y
, a copy is made. This is related to reference counting -- you can see some work by Luke Tierney on this, e.g. this presentation
data.table
的区别是 data.table
为父对象启用编辑权限-我认为您知道这是一把双刃剑。
The difference for data.table
is that data.table
enables edit permissions for the parent object -- a double-edged sword as I think you know.
这篇关于是否已记录R data.table通过引用作为参数传递?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!