[英] Why does data.table update names(DT) by reference, even if I assign to another variable?

查看:95
本文介绍了的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已将 data.table 的名称存储为向量

  library(data.table)
set.seed(42)
DT< - data.table(x = runif 100),y = runif(100))
names1 < - names(DT)

我可以告诉,它是一个普通的香草字符向量:

  str(names1)
#chr [ 1:2]xy

class(names1)
#[1]character

dput(names1) c(x,y)

然而,这不是普通的字符向量。这是一个魔法字符矢量!当我向 data.table 添加新列时,此向量会更新!

  DT [,z:= runif(100)] 
names1
#[1]xyz
pre>

我知道这和:= 如何通过赋值更新有关,我希望< - data.table 进行复制的名字。



我可以通过包装 c()中的名称来修复此问题:

  library(data.table)
set.seed(42)
DT < - data.table(x = runif (名称1,名称2)的名称(名称(DT))b。
#[1] TRUE

DT [,z:= runif(100)]
names1
#[1]x

names2
#[1]xy

我的问题是2倍:


  1. 为什么 names1 创建 data.table 的名称的副本?在其他情况下,我们明确警告,< - 创建副本, data.table s和<$ names1< - names(DT) data.frame c>和 names2 < - c(names(DT))


解决方案

更新:现在在1.9.3版本的?copy 文档中添加。从新闻





  1. ?copy 移到自己的帮助页面,必须为 dt_names 必须修改code> dt_names< - copy(names(DT))通过引用更新 DT (例如:通过引用添加新列)。关闭#512 。感谢Zach 这个SO问题和user1971988 此SO问题






$ b b

您的第一个问题的一部分让我对< - 运算符(至少)有什么不清楚在 data.table 的上下文中),特别是部分:在其他情况下,我们明确警告,< - 创建data.tables和data.frames。



因此,在回答您的实际问题之前,我将在这里简单地触摸一下。在 data.table 的情况下, < - (赋值)只是 >复制 data.table 。例如:

  DT < -  data.table(x = 1:5,y = 6:10)
#通过引用分配DT2到DT
DT2 < - DT#分配,不进行复制。
DT2 [,z:= 11:15]
#DT也会有z列

如果你想创建一个 copy ,那么你必须使用 copy 命令

  DT2 < - 拷贝(DT)#拷贝内容到DT2 
DT2 [,z:= 11: 15]#只有DT2受影响

从CauchyDistributedRV,我明白你的意思是赋值 names(dt)< - 。会导致警告。

现在,回答你的第一个问题:它似乎 names1< - names(DT)也表现相似。我直到现在还没有想到/知道这个。 .Internal(inspect。)命令在此非常有用:

 #@ 7fc86a851480 16 STRSXP g0c7 [MARK,NAM(2)](len = 2,tl = 100)
#@ 7fc86a069f68 09 CHARSXP g1c1 [MARK, gp = 0x61] [ASCII] [cached]x
#@ 7fc86a0f96d8 09 CHARSXP g1c1 [MARK,gp = 0x61] [ASCII] [cached]y

。检查(名称(DT)))
#@ 7fc86a851480 16 STRSXP g0c7 [MARK,NAM(2)](len = 2,tl = 100)
#@ 7fc86a069f68 09 CHARSXP g1c1 [MARK,gp = 0x61] [ASCII] [cached]x
#@ 7fc86a0f96d8 09 CHARSXP g1c1 [MARK,gp = 0x61] [ASCII] [cached]y
pre>

这里,你看到他们指向同一个内存位置 @ 7fc86a851480 。即使 names1 truelength 为100(默认分配在 data.table ,请检查?alloc.col )。

  truelength(names1)
#[1] 100

names1< - names(dt)似乎通过引用发生。也就是说, names1 指向与dt的列名指针相同的位置。



回答您的第二个问题 c(。)由于没有检查由于级联操作而导致的内容结果是否不同,因此复制。也就是说,因为 c(。)操作可以改变向量的内容,所以它立即导致复制而不检查是否内容被修改不是。


I've stored the names of a data.table as a vector:

library(data.table)
set.seed(42)
DT <- data.table(x = runif(100), y = runif(100))
names1 <- names(DT)

As far as I can tell, it's a plain vanilla character vector:

str(names1)
# chr [1:2] "x" "y"

class(names1)
# [1] "character"

dput(names1)
# c("x", "y")

However, this is no ordinary character vector. It's a magic character vector! When I add a new column to my data.table, this vector gets updated!

DT[ , z := runif(100)]
names1
# [1] "x" "y" "z"

I know this has something to do with how := updates by assignment, but this still seems magic to me, as I expect <- to make a copy of the data.table's names.

I can fix this by wrapping the names in c():

library(data.table)
set.seed(42)
DT <- data.table(x = runif(100), y = runif(100))

names1 <- names(DT)
names2 <- c(names(DT))
all.equal(names1, names2)
# [1] TRUE

DT[ , z := runif(100)]
names1
# [1] "x" "y" "z"

names2
# [1] "x" "y"

My question is 2-fold:

  1. Why doesn't names1 <- names(DT) create a copy of the data.table's names? In other instances, we are explicitly warned that <- creates copies, both of data.tables and data.frames.
  2. What's the difference between names1 <- names(DT) and names2 <- c(names(DT))?

解决方案

Update: This is now added in the documentation for ?copy in version 1.9.3. From NEWS:

  1. Moved ?copy to it's own help page, and documented that dt_names <- copy(names(DT)) is necessary for dt_names to be not modified by reference as a result of updating DT by reference (ex: adding a new column by reference). Closes #512. Thanks to Zach for this SO question and user1971988 for this SO question.


Part of your first question makes it a bit unclear to me as to what you really mean about <- operator (at least in the context of data.table), especially the part: In other instances, we are explicitly warned that <- creates copies, both of data.tables and data.frames.

So, before answering your actual question, I'll briefly touch it here. In case of a data.table a <- (assignment) merely is not sufficient for copying a data.table. For example:

DT <- data.table(x = 1:5, y= 6:10)
# assign DT2 to DT
DT2 <- DT # assign by reference, no copy taken.
DT2[, z := 11:15]
# DT will also have the z column

If you want to create a copy, then you've to explicitly mention it using copy command.

DT2 <- copy(DT) # copied content to DT2
DT2[, z := 11:15] # only DT2 is affected

From CauchyDistributedRV, I understand what you mean is the assignment names(dt) <- . that'll result in the warning. I'll leave it as such.


Now, to answer your first question: It seems that names1 <- names(DT) also behaves similarly. I hadn't thought/known about this until now. The .Internal(inspect(.)) command is very useful here:

.Internal(inspect(names1))
# @7fc86a851480 16 STRSXP g0c7 [MARK,NAM(2)] (len=2, tl=100)
#   @7fc86a069f68 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "x"
#   @7fc86a0f96d8 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "y"

.Internal(inspect(names(DT)))
# @7fc86a851480 16 STRSXP g0c7 [MARK,NAM(2)] (len=2, tl=100)
#   @7fc86a069f68 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "x"
#   @7fc86a0f96d8 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "y"

Here, you see that they are pointing to the same memory location @7fc86a851480. Even the truelength of names1 is 100 (which is by default allocated in data.table, check ?alloc.col for this).

truelength(names1)
# [1] 100

So basically, the assignment names1 <- names(dt) seems to happen by reference. That is, names1 is pointing to the same location as dt's column names pointer.

To answer your second question: The command c(.) seems to create a copy as there is no checking as to whether the contents result due to concatenation operation are different. That is, because c(.) operation can change the contents of the vector, it immediately results in a "copy" being made without checking if the contents are modified are not.

这篇关于的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆