当表被“复制”时，data.table中的二级密钥（“索引”属性）通过选择列 [英] secondary key ("index" attribute) in data.table is lost when table is "copied" by selecting columns

查看：135 发布时间：2017/3/12 12:28:07 r data.table

本文介绍了当表被“复制”时，data.table中的二级密钥（“索引”属性）通过选择列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个data.table myDT ，并且我通过三种不同的方式制作此表的副本：

  myDT < -  data.table（colA = 1：3）
 myDT [colA == 3] 
 
 copy1< -  copy（myDT）
 copy2<  -  myDT＃是我知道它是一个引用，不是真正的副本
 copy3<  -  myDT [，。（colA）]＃表

然后我将这些副本与原始表格进行比较：

 完全相同（myDT，copy1）
＃TRUE 
相同（myDT，copy2）
＃TRUE 
相同myDT，copy3）
＃FALSE

我试图找出 myDT 和 copy3

  same（names（myDT），names（copy3））
＃TRUE 
 all.equal（myDT，copy3，check.attributes = FALSE）
＃TRUE 
 all .equal（myDT，copy3，check.attributes = FALSE，trim.levels = FALSE，check.names = TRUE）
＃TRUE 
 attr.all.equal（myDT，copy3，check.attributes = FALSE ，trim.levels = FALSE，check.names = TRUE）
＃NULL 
 all.equal（myDT，copy3）
＃[1]Attributes：长度不匹配：在前1个组件上的比较>
 attr.all.equal（myDT，copy3）
＃[1]属性：名称：1个字符串不匹配>
＃[2]属性：长度不匹配：前3个分量上的比较>
＃[3]属性：组件3：属性：模式：list，NULL> >
＃[4]属性：组件3：属性：目标的名称，但不是当前的> >
＃[5]属性：组件3：属性：当前不是列表式> >
＃[6]属性：组件3：数字：长度（0,3）不同>

最后我来到使用 attributes（）函数：

  attr0 < -  attributes（myDT）
 attr3 < -  attributes（copy3）
 str（attr0）
 str（attr3）
   
 
 它表明原始 data.table 有一个 code> 
解决方案
为了使这个问题更清楚（对未来的读者来说可能有用），这里真正发生的是，你可能不设置辅助键，同时显式调用 set2key ，OR， data.table 似乎设置了一个辅助键这是V 1.9.4中添加的（不是这样）新功能
 
   DT [column == value]现在已经优化了使用键（DT）[1] ==column时使用
 DT的键的DT [％]值， 
 index）会自动添加，所以下一个DT [column == value]的速度就快
。不需要更改代码;现有代码应该自动
获益。可以使用set2key（）手动添加辅助键，使用key2（）选择
存在。这些优化和函数
 names / arguments是实验性的，可以通过
选项（datatable.auto.index = FALSE）关闭。
 
 
 
 
 
 
 
 让我们重现这个
  myDT <  -  data.table（A = 1：3）
 options（datatable.verbose = TRUE）
 myDT [A == 3] 
＃ ~~~这里是
＃forder占用0秒
＃强制双列i.'V1'为整数以匹配x.'A'的类型。请避免强制提高效率。 
＃开始bmerge ...在0秒内完成
＃A 
＃1：3 
 
 attr（myDT，index）＃或使用`key2 myDT）`
＃integer（0）
＃attr（，__ A）
＃integer（0）
  
因此，与您假设不同的是，您实际上 创建了副本，因此辅助键未随其传输。比较
  copy1<  -  myDT 
 attr（copy1，index）
＃integer ）
＃检查j是否使用这些列：
 
＃attr（，__ A）
＃integer（0）
 copy2 < A< ~~~这是复制发生的地方
 attr（copy2，index）
＃NULL 
 
 identical（myDT，copy1）
＃ 1] TRUE 
 identical（myDT，copy2）
＃[1] FALSE 
  
 
  tracemem（myDT）
＃[1]< 00000000159CBBB0> 
 tracemem（copy1）
＃[1]< 00000000159CBBB0> 
 tracemem（copy2）
＃[1]< 000000001A5A46D8> 
  
 
 
 
 
 
 这里最有趣的结论，即使对象保持不变， [。data.table  也会创建副本。
 
I have a data.table myDT, and I'm making "copies" of this table by 3 different ways:
myDT <- data.table(colA = 1:3)
myDT[colA == 3]

copy1 <- copy(myDT)
copy2 <- myDT # yes I know that it's a reference, not real copy
copy3 <- myDT[,.(colA)] # I list all columns from the original table
Then I'm comparing those copies with the original table:
identical(myDT, copy1) 
# TRUE
identical(myDT, copy2)
# TRUE
identical(myDT, copy3)
# FALSE
I was trying to figure out what was the difference between myDT and copy3
identical(names(myDT), names(copy3))
# TRUE
all.equal(myDT, copy3, check.attributes=FALSE)
# TRUE
all.equal(myDT, copy3, check.attributes=FALSE, trim.levels=FALSE, check.names=TRUE)
# TRUE
attr.all.equal(myDT, copy3, check.attributes=FALSE, trim.levels=FALSE, check.names=TRUE)
# NULL
all.equal(myDT, copy3)
# [1] "Attributes: < Length mismatch: comparison on first 1 components >"
attr.all.equal(myDT, copy3)
# [1] "Attributes: < Names: 1 string mismatch >"                                         
# [2] "Attributes: < Length mismatch: comparison on first 3 components >"                
# [3] "Attributes: < Component 3: Attributes: < Modes: list, NULL > >"                   
# [4] "Attributes: < Component 3: Attributes: < names for target but not for current > >"
# [5] "Attributes: < Component 3: Attributes: < current is not list-like > >"            
# [6] "Attributes: < Component 3: Numeric: lengths (0, 3) differ >"
My original question was how to understand the last output. Finally I came to using the attributes() function:
attr0 <- attributes(myDT)
attr3 <- attributes(copy3)
str(attr0)
str(attr3)
it has shown that original data.table had an index attribute which was not copied when I created copy3.
 解决方案 
In order to make this question a bit clearer (and maybe useful for future readers), what really happened here is that you (probably not) set a secondary key while explicitly calling set2key, OR, data.table seemingly set a secondary key while you were making some ordinary operations such as filtering. This is a (not so) new feature added in V 1.9.4

  DT[column==value] and DT[column %in% values] are now optimized to use
  DT's key when key(DT)[1]=="column", otherwise a secondary key (a.k.a.
  index) is automatically added so the next DT[column==value] is much
  faster. No code changes are needed; existing code should automatically
  benefit. Secondary keys can be added manually using set2key() and
  existence checked using key2(). These optimizations and function
  names/arguments are experimental and may be turned off with
  options(datatable.auto.index=FALSE).




Lets reproduce this
myDT <- data.table(A = 1:3)
options(datatable.verbose = TRUE)
myDT[A == 3]    
# Creating new index 'A' <~~~~ Here it is
# forder took 0 sec
# Coercing double column i.'V1' to integer to match type of x.'A'. Please avoid coercion for efficiency.
# Starting bmerge ...done in 0 secs
# A
# 1: 3

attr(myDT, "index") # or using `key2(myDT)`
# integer(0)
# attr(,"__A")
# integer(0)
So, unlike you were assuming, you actually did create a copy and thus the secondary key wasn't transferred with it. Compare
copy1 <- myDT
attr(copy1, "index")
# integer(0)
# attr(,"__A")
# integer(0)
copy2 <- myDT[,.(A)]
# Detected that j uses these columns: A <~~~ This is where the copy occures
attr(copy2, "index")
# NULL

identical(myDT, copy1)
# [1] TRUE
identical(myDT, copy2)
# [1] FALSE
And for some further validation
tracemem(myDT)
# [1] "<00000000159CBBB0>"
tracemem(copy1)
# [1] "<00000000159CBBB0>"
tracemem(copy2)
# [1] "<000000001A5A46D8>"




The most interesting conclusion here, one could claim, that [.data.table does create a copy, even if the object remains unchanged.

                        这篇关于当表被“复制”时，data.table中的二级密钥（“索引”属性）通过选择列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

当表被“复制”时，data.table中的二级密钥（“索引”属性）通过选择列 [英] secondary key ("index" attribute) in data.table is lost when table is "copied" by selecting columns

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

当表被“复制”时，data.table中的二级密钥（“索引”属性）通过选择列 [英] secondary key (&quot;index&quot; attribute) in data.table is lost when table is &quot;copied&quot; by selecting columns

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

当表被“复制”时，data.table中的二级密钥（“索引”属性）通过选择列 [英] secondary key ("index" attribute) in data.table is lost when table is "copied" by selecting columns

登录关闭