[英] Why does data.table update names(DT) by reference, even if I assign to another variable?
问题描述
我已将 data.table
的名称存储为向量
:
library(data.table)
set.seed(42)
DT< - data.table(x = runif 100),y = runif(100))
names1 < - names(DT)
我可以告诉,它是一个普通的香草字符向量:
str(names1)
#chr [ 1:2]xy
class(names1)
#[1]character
dput(names1) c(x,y)
然而,这不是普通的字符向量。这是一个魔法字符矢量!当我向 data.table
添加新列时,此向量会更新!
DT [,z:= runif(100)]
pre>
names1
#[1]xyz
我知道这和
:=
如何通过赋值更新有关,我希望< -
对data.table
进行复制的名字。
我可以通过包装
c()
中的名称来修复此问题:library(data.table)
set.seed(42)
DT < - data.table(x = runif (名称1,名称2)的名称(名称(DT))b。
#[1] TRUE
DT [,z:= runif(100)]
names1
#[1]x
names2
#[1]xy
我的问题是2倍:
- 为什么
names1
创建data.table
的名称的副本?在其他情况下,我们明确警告,< -
创建副本,data.table
s和<$names1< - names(DT)$ c $之间的区别是什么?c $ c> data.frame
c>和names2 < - c(names(DT))
?
解决方案
更新:现在在1.9.3版本的
?copy
文档中添加。从新闻:
- 将
?copy
移到自己的帮助页面,必须为dt_names
必须修改code> dt_names< - copy(names(DT))通过引用更新DT
(例如:通过引用添加新列)。关闭#512 。感谢Zach 这个SO问题和user1971988 此SO问题。
$ b b您的第一个问题的一部分让我对
< -
运算符(至少)有什么不清楚在data.table
的上下文中),特别是部分:在其他情况下,我们明确警告,< - 创建data.tables和data.frames。
因此,在回答您的实际问题之前,我将在这里简单地触摸一下。在
data.table
的情况下,< -
(赋值)只是 >复制data.table
。例如:DT < - data.table(x = 1:5,y = 6:10)
#通过引用分配DT2到DT
DT2 < - DT#分配,不进行复制。
DT2 [,z:= 11:15]
#DT也会有z列
如果你想创建一个
copy
,那么你必须使用copy
命令DT2 < - 拷贝(DT)#拷贝内容到DT2
DT2 [,z:= 11: 15]#只有DT2受影响
从CauchyDistributedRV,我明白你的意思是赋值
现在,回答你的第一个问题:它似乎names(dt)< - 。
会导致警告。names1< - names(DT)
也表现相似。我直到现在还没有想到/知道这个。.Internal(inspect。)
命令在此非常有用:#@ 7fc86a851480 16 STRSXP g0c7 [MARK,NAM(2)](len = 2,tl = 100)
#@ 7fc86a069f68 09 CHARSXP g1c1 [MARK, gp = 0x61] [ASCII] [cached]x
#@ 7fc86a0f96d8 09 CHARSXP g1c1 [MARK,gp = 0x61] [ASCII] [cached]y
。检查(名称(DT)))
#@ 7fc86a851480 16 STRSXP g0c7 [MARK,NAM(2)](len = 2,tl = 100)
#@ 7fc86a069f68 09 CHARSXP g1c1 [MARK,gp = 0x61] [ASCII] [cached]x
#@ 7fc86a0f96d8 09 CHARSXP g1c1 [MARK,gp = 0x61] [ASCII] [cached]y
pre>
这里,你看到他们指向同一个内存位置
@ 7fc86a851480
。即使names1
的truelength
为100(默认分配在data.table
,请检查?alloc.col
)。truelength(names1)
#[1] 100
names1< - names(dt)
似乎通过引用发生。也就是说,names1
指向与dt的列名指针相同的位置。
回答您的第二个问题:
c(。)
由于没有检查由于级联操作而导致的内容结果是否不同,因此复制。也就是说,因为c(。)
操作可以改变向量的内容,所以它立即导致复制而不检查是否内容被修改不是。I've stored the names of a
data.table
as avector
:library(data.table) set.seed(42) DT <- data.table(x = runif(100), y = runif(100)) names1 <- names(DT)
As far as I can tell, it's a plain vanilla character vector:
str(names1) # chr [1:2] "x" "y" class(names1) # [1] "character" dput(names1) # c("x", "y")
However, this is no ordinary character vector. It's a magic character vector! When I add a new column to my
data.table
, this vector gets updated!DT[ , z := runif(100)] names1 # [1] "x" "y" "z"
I know this has something to do with how
:=
updates by assignment, but this still seems magic to me, as I expect<-
to make a copy of thedata.table
's names.I can fix this by wrapping the names in
c()
:library(data.table) set.seed(42) DT <- data.table(x = runif(100), y = runif(100)) names1 <- names(DT) names2 <- c(names(DT)) all.equal(names1, names2) # [1] TRUE DT[ , z := runif(100)] names1 # [1] "x" "y" "z" names2 # [1] "x" "y"
My question is 2-fold:
- Why doesn't
names1 <- names(DT)
create a copy of thedata.table
's names? In other instances, we are explicitly warned that<-
creates copies, both ofdata.table
s anddata.frame
s.- What's the difference between
names1 <- names(DT)
andnames2 <- c(names(DT))
?
解决方案Update: This is now added in the documentation for
?copy
in version 1.9.3. From NEWS:
- Moved
?copy
to it's own help page, and documented thatdt_names <- copy(names(DT))
is necessary fordt_names
to be not modified by reference as a result of updatingDT
by reference (ex: adding a new column by reference). Closes #512. Thanks to Zach for this SO question and user1971988 for this SO question.
Part of your first question makes it a bit unclear to me as to what you really mean about
<-
operator (at least in the context ofdata.table
), especially the part: In other instances, we are explicitly warned that <- creates copies, both of data.tables and data.frames.So, before answering your actual question, I'll briefly touch it here. In case of a
data.table
a<-
(assignment) merely is not sufficient for copying adata.table
. For example:DT <- data.table(x = 1:5, y= 6:10) # assign DT2 to DT DT2 <- DT # assign by reference, no copy taken. DT2[, z := 11:15] # DT will also have the z column
If you want to create a
copy
, then you've to explicitly mention it usingcopy
command.DT2 <- copy(DT) # copied content to DT2 DT2[, z := 11:15] # only DT2 is affected
From CauchyDistributedRV, I understand what you mean is the assignment
names(dt) <- .
that'll result in the warning. I'll leave it as such.
Now, to answer your first question: It seems that
names1 <- names(DT)
also behaves similarly. I hadn't thought/known about this until now. The.Internal(inspect(.))
command is very useful here:.Internal(inspect(names1)) # @7fc86a851480 16 STRSXP g0c7 [MARK,NAM(2)] (len=2, tl=100) # @7fc86a069f68 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "x" # @7fc86a0f96d8 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "y" .Internal(inspect(names(DT))) # @7fc86a851480 16 STRSXP g0c7 [MARK,NAM(2)] (len=2, tl=100) # @7fc86a069f68 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "x" # @7fc86a0f96d8 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "y"
Here, you see that they are pointing to the same memory location
@7fc86a851480
. Even thetruelength
ofnames1
is 100 (which is by default allocated indata.table
, check?alloc.col
for this).truelength(names1) # [1] 100
So basically, the assignment
names1 <- names(dt)
seems to happen by reference. That is,names1
is pointing to the same location as dt's column names pointer.To answer your second question: The command
c(.)
seems to create a copy as there is no checking as to whether the contents result due to concatenation operation are different. That is, becausec(.)
operation can change the contents of the vector, it immediately results in a "copy" being made without checking if the contents are modified are not.这篇关于的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!