data.table join然后将列添加到现有的data.frame而不重新复制 [英] data.table join then add columns to existing data.frame without re-copy

查看:110
本文介绍了data.table join然后将列添加到现有的data.frame而不重新复制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个 data.tables ,X(3m行乘以〜500列)和Y(100行乘以2列)。

  set.seed(1)
X< - data.table(a = letters,b = c = letters,g = sample(c(1:5,7),length(letters),replace = TRUE),key =g)
Y < ),g = 1:6,key =g)

加入X,我可以通过 Y [X] 做到:



为什么X [ Y] join of data.tables不允许完全外连接或左连接?



但我想添加新列到 X 复制 X $ b

显然, X <-Y [X] 可以工作,除非 data.table 我相信这会复制整个 X



X [,z:= Y [X,z] $ z] 可以工作,但是kludgy到多个列。



如何将合并的结果存储到保留的data.table中,并以有效的方式存储程序员时间)?

解决方案

这很容易做:

  X [Y,z:= iz] 

因为 Y [X] X [Y] 之间的唯一区别就是当某些元素不在 Y ,在这种情况下,你可能想要 z NA

它也适用于许多变量:

  X [Y,`:=`(z1 = i.z1,z2 = i.z2,...)] 
pre>




由于您需要操作 Y [X] 你可以添加参数 nomatch = 0 (作为@mnel指出),以便不会获得那些其中X不包含来自Y的键值的那些NAs。 :

  X [Y,z:= iz,nomatch = 0] 






NEWS for data.table


  ****************************************** **** 
** **
**数据表中的变化1.7.10 **
** **
********* *************************************

新功能

 现在可以在j中使用引用连接继承
列的i,否则用x中的列掩码,
使用相同的名称。



I have two data.tables, X (3m rows by ~500 columns), and Y (100 rows by two columns).

set.seed(1)
X <- data.table( a=letters, b=letters, c=letters, g=sample(c(1:5,7),length(letters),replace=TRUE), key="g" )
Y <- data.table( z=runif(6), g=1:6, key="g" )

I want to do a left outer join on X, which I can do by Y[X] thanks to:

Why does X[Y] join of data.tables not allow a full outer join, or a left join?

But I want to add the new column to X without copying X (since it's huge).

Obviously, something like X <- Y[X] works, but unless data.table is far cleverer than I give it credit for (and I give it credit for quite a lot of deviousness!), I believe this copies the whole of X.

X[ , z:= Y[X,z]$z ] works, but is kludgy and doesn't scale well to more than one column.

How do I store the results of a merge back into the retained data.table in an efficient (both in terms of copies and in terms of programmer time) way?

解决方案

This is easy to do:

X[Y, z := i.z]

It works because the only difference between Y[X] and X[Y] here, is when some elements are not in Y, in which case presumably you'd want z to be NA, which the above assignment will exactly do.

It would also work just as well for many variables:

X[Y, `:=`(z1 = i.z1, z2 = i.z2, ...)]


Since you require the operation Y[X], you can add the argument nomatch=0 (as @mnel points out) so as to not get NAs for those where X doesn't contain the key values from Y. That is:

X[Y, z := i.z, nomatch=0]


From the NEWS for data.table

    **********************************************
    **                                          **
    **   CHANGES IN DATA.TABLE VERSION 1.7.10   **
    **                                          **
    **********************************************

NEW FEATURES

o   The prefix i. can now be used in j to refer to join inherited
    columns of i that are otherwise masked by columns in x with
    the same name.

这篇关于data.table join然后将列添加到现有的data.frame而不重新复制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆