data.table join然后将列添加到现有的data.frame而不重新复制 [英] data.table join then add columns to existing data.frame without re-copy
问题描述
我有两个 data.tables
,X(3m行乘以〜500列)和Y(100行乘以2列)。
set.seed(1)
X< - data.table(a = letters,b = c = letters,g = sample(c(1:5,7),length(letters),replace = TRUE),key =g)
Y < ),g = 1:6,key =g)
加入X,我可以通过 Y [X]
做到:
为什么X [ Y] join of data.tables不允许完全外连接或左连接?
但我想添加新列到 X
复制 X
$ b
显然, X <-Y [X]
可以工作,除非 data.table
我相信这会复制整个 X
。
X [,z:= Y [X,z] $ z]
可以工作,但是kludgy到多个列。
如何将合并的结果存储到保留的data.table中,并以有效的方式存储程序员时间)?
这很容易做:
X [Y,z:= iz]
因为 Y [X]
和 X [Y]
之间的唯一区别就是当某些元素不在 Y
,在这种情况下,你可能想要 z
为 NA $ c $
它也适用于许多变量:
X [Y,`:=`(z1 = i.z1,z2 = i.z2,...)]
pre>
由于您需要操作
Y [X]
你可以添加参数nomatch = 0
(作为@mnel指出),以便不会获得那些其中X不包含来自Y的键值的那些NAs。 :X [Y,z:= iz,nomatch = 0]
****************************************** ****
** **
**数据表中的变化1.7.10 **
** **
********* *************************************
新功能
现在可以在j中使用引用连接继承
列的i,否则用x中的列掩码,
使用相同的名称。
I have two
data.tables
, X (3m rows by ~500 columns), and Y (100 rows by two columns).set.seed(1) X <- data.table( a=letters, b=letters, c=letters, g=sample(c(1:5,7),length(letters),replace=TRUE), key="g" ) Y <- data.table( z=runif(6), g=1:6, key="g" )
I want to do a left outer join on X, which I can do by
Y[X]
thanks to:Why does X[Y] join of data.tables not allow a full outer join, or a left join?
But I want to add the new column to
X
without copyingX
(since it's huge).Obviously, something like
X <- Y[X]
works, but unlessdata.table
is far cleverer than I give it credit for (and I give it credit for quite a lot of deviousness!), I believe this copies the whole ofX
.
X[ , z:= Y[X,z]$z ]
works, but is kludgy and doesn't scale well to more than one column.How do I store the results of a merge back into the retained data.table in an efficient (both in terms of copies and in terms of programmer time) way?
解决方案This is easy to do:
X[Y, z := i.z]
It works because the only difference between
Y[X]
andX[Y]
here, is when some elements are not inY
, in which case presumably you'd wantz
to beNA
, which the above assignment will exactly do.It would also work just as well for many variables:
X[Y, `:=`(z1 = i.z1, z2 = i.z2, ...)]
Since you require the operation
Y[X]
, you can add the argumentnomatch=0
(as @mnel points out) so as to not get NAs for those where X doesn't contain the key values from Y. That is:X[Y, z := i.z, nomatch=0]
From the NEWS for data.table
********************************************** ** ** ** CHANGES IN DATA.TABLE VERSION 1.7.10 ** ** ** **********************************************
NEW FEATURES
o The prefix i. can now be used in j to refer to join inherited columns of i that are otherwise masked by columns in x with the same name.
这篇关于data.table join然后将列添加到现有的data.frame而不重新复制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!