是否已记录R data.table通过引用作为参数传递? [英] Is R data.table documented to pass by reference as argument?

查看:70
本文介绍了是否已记录R data.table通过引用作为参数传递?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

检查此玩具代码:

> x <- data.table(a = 1:2) 
> foo <- function(z) { z[, b:=3:4]  }
> y <- foo(x)
> x[]
   a b
1: 1 3
2: 2 4

似乎data.table通过引用传递。这是故意的吗?有记录吗?我确实通读了文档,却找不到关于此行为的提及。

It seems data.table is passed by reference. Is this intentional? Is this documented? I did read through the docs and couldn't find a mention of this behaviour.

我不是不是,它询问R的文档参考语义(在<$ c中$ c>:= set *** 等)。我问一个data.table完整对象是否应该作为函数参数通过引用传递。

I'm not asking about R's documented reference semantics (in :=, set*** and some others). I'm asking whether a data.table complete object is supposed to be passed by reference as a function argument.

编辑:按照@Oliver的回答,在这里是一些更奇怪的例子。

Following @Oliver's answer, here are some more curious examples.

> dt<- data.table(a=1:2)
> attr(dt, ".internal.selfref")
<pointer: 0x564776a93e88>
> address(dt)
[1] "0x5647bc0f6c50"
> 
> ff<-function(x) { x[, b:=3:4]; print(address(x)); print(attr(dt, ".internal.selfref")) }
> ff(dt)
[1] "0x5647bc0f6c50"
<pointer: 0x564776a93e88>

所以 .internal.selfref 不仅与呼叫者的相同dt复制,地址也是。确实是同一对象。 (我认为)。

So not only is .internal.selfref identical to the caller's dt copy, so is the address. It really is the same object. (I think).

对于data.frames并非完全如此:

This is not exactly the case for data.frames:

> df<- data.frame(a=1:2)
> address(df)
[1] "0x5647b39d21e8"
> ff<-function(x) { print(address(x)); x$b=3:4; print(address(x)) }
> 
> ff(df)
[1] "0x5647b39d21e8"
[1] "0x5647ae24de78"

也许根本问题是常规的data.table操作某种程度上不会触发R的修改时复制语义? p>

Maybe the root issue is that regular data.table operations somehow do not trigger R's copy-on-modify semantics?

推荐答案

我认为您感到惊讶的实际上是R行为,这就是为什么在 data.table (也许应该仍然如此,因为对于 data.table 而言,影响更为重要)。

I think what you're being surprised about is actually R behavior, which is why it's not specifically documented in data.table (maybe it should be anyway, as the implications are more important for data.table).

让您惊讶的是,传递给函数的对象具有相同的地址,但是对于 base R也是如此:

You were surprised that the object passed to a function had the same address, but this is the same for base R as well:

x = 1:10
address(x)
# [1] "0x7fb7d4b6c820"
(function(y) {print(address(y))})(x)
# [1] "0x7fb7d4b6c820"

函数环境中已复制的是指针 x 。此外,对于 base R,父 x 是不可变的:

What's being copied in the function environment is the pointer to x. Moreover, for base R, the parent x is immutable:

foo = function(y) {
  print(address(y))
  y[1L] = 2L
  print(address(y))
}
foo(x)
# [1] "0x7fb7d4b6c820"
# [1] "0x7fb7d4e11d28"

也就是说,只要我们尝试编辑 y ,将进行复制。这与引用计数有关-您可以在此例如此演示文稿

That is, as soon as we try to edit y, a copy is made. This is related to reference counting -- you can see some work by Luke Tierney on this, e.g. this presentation

data.table 的区别是 data.table 为父对象启用编辑权限-我认为您知道这是一把双刃剑。

The difference for data.table is that data.table enables edit permissions for the parent object -- a double-edged sword as I think you know.

这篇关于是否已记录R data.table通过引用作为参数传递?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆