data.table join然后将列添加到现有的data.frame而不重新复制 [英] data.table join then add columns to existing data.frame without re-copy

查看：110 发布时间：2017/3/12 9:54:31 r data.table

本文介绍了data.table join然后将列添加到现有的data.frame而不重新复制的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个 data.tables ，X（3m行乘以〜500列）和Y（100行乘以2列）。

  set.seed（1）
 X<  -  data.table（a = letters，b = c = letters，g = sample（c（1：5,7），length（letters），replace = TRUE），key =g）
 Y < ），g = 1：6，key =g）

加入X，我可以通过 Y [X] 做到：

为什么X [ Y] join of data.tables不允许完全外连接或左连接？

但我想添加新列到 X 复制 X $ b

显然， X <-Y [X] 可以工作，除非 data.table 我相信这会复制整个 X 。

X [，z：= Y [X，z] $ z] 可以工作，但是kludgy到多个列。

如何将合并的结果存储到保留的data.table中，并以有效的方式存储程序员时间）？

解决方案

这很容易做：

  X [Y，z：= iz]

因为 Y [X] 和 X [Y] 之间的唯一区别就是当某些元素不在 Y ，在这种情况下，你可能想要 z 为NA

它也适用于许多变量：

  X [Y，`：=`（z1 = i.z1，z2 = i.z2，...）] 
  pre> 
 
 
 
 
 由于您需要操作 Y [X] 你可以添加参数 nomatch = 0 （作为@mnel指出），以便不会获得那些其中X不包含来自Y的键值的那些NAs。 ：
  X [Y，z：= iz，nomatch = 0] 
  
 
 
 
 
 
 从 NEWS for data.table  
 
 
  ****************************************** **** 
 ** ** 
 **数据表中的变化1.7.10 ** 
 ** ** 
 ********* ************************************* 
  
新功能
 现在可以在j中使用引用连接继承
列的i，否则用x中的列掩码，
使用相同的名称。 
  
 
 
 
I have two data.tables, X (3m rows by ~500 columns), and Y (100 rows by two columns).  
set.seed(1)
X <- data.table( a=letters, b=letters, c=letters, g=sample(c(1:5,7),length(letters),replace=TRUE), key="g" )
Y <- data.table( z=runif(6), g=1:6, key="g" )
I want to do a left outer join on X, which I can do by Y[X] thanks to:

Why does X[Y] join of data.tables not allow a full outer join, or a left join?

But I want to add the new column to X without copying X (since it's huge).

Obviously, something like X <- Y[X] works, but unless data.table is far cleverer than I give it credit for (and I give it credit for quite a lot of deviousness!), I believe this copies the whole of X.

X[ , z:= Y[X,z]$z ] works, but is kludgy and doesn't scale well to more than one column.

How do I store the results of a merge back into the retained data.table in an efficient (both in terms of copies and in terms of programmer time) way?
 解决方案 
This is easy to do:
X[Y, z := i.z]
It works because the only difference between Y[X] and X[Y] here, is when some elements are not in Y, in which case presumably you'd want z to be NA, which the above assignment will exactly do.

It would also work just as well for many variables:
X[Y, `:=`(z1 = i.z1, z2 = i.z2, ...)]




Since you require the operation Y[X], you can add the argument nomatch=0 (as @mnel points out) so as to not get NAs for those where X doesn't contain the key values from Y. That is:
X[Y, z := i.z, nomatch=0]




From the NEWS for data.table

    **********************************************
    **                                          **
    **   CHANGES IN DATA.TABLE VERSION 1.7.10   **
    **                                          **
    **********************************************
NEW FEATURES
o   The prefix i. can now be used in j to refer to join inherited
    columns of i that are otherwise masked by columns in x with
    the same name.



                        
这篇关于data.table join然后将列添加到现有的data.frame而不重新复制的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

data.table join然后将列添加到现有的data.frame而不重新复制 [英] data.table join then add columns to existing data.frame without re-copy

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

data.table join然后将列添加到现有的data.frame而不重新复制 [英] data.table join then add columns to existing data.frame without re-copy

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭