如何进行 data.table 合并操作 [英] How to do a data.table merge operation

查看:21
本文介绍了如何进行 data.table 合并操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

注意:这个问题和下面的答案是指data.table版本<1.5.3;v. 1.5.3 于 2011 年 2 月发布以解决此问题. 查看更多最新处理 (03-2012):将外键的 SQL 连接转换为 R 数据表语法

note: this question and the following answers refer to data.table versions < 1.5.3; v. 1.5.3 was released in Feb 2011 to resolve this issue. see more recent treatment (03-2012): Translating SQL joins on foreign keys to R data.table syntax

我一直在研究 数据的文档.表包(替代data.frame,对某些操作更有效),包括Josh Reich 在 NYC R Meetup 上关于 SQL 和 data.table 的演讲 (pdf),但无法弄清楚这个完全无关紧要的操作.

I've been digging through the documentation for the data.table package (a replacement for data.frame that's much more efficient for certain operations), including Josh Reich's presentation on SQL and data.table at the NYC R Meetup (pdf), but can't figure this totally trivial operation out.

> x <- DT(a=1:3, b=2:4, key='a')
> x
     a b
[1,] 1 2
[2,] 2 3
[3,] 3 4
> y <- DT(a=1:3, c=c('a','b','c'), key='a')
> y
     a c
[1,] 1 a
[2,] 2 b
[3,] 3 c
> x[y]
     a b
[1,] 1 2
[2,] 2 3
[3,] 3 4
> merge(x,y)
  a b c
1 1 2 a
2 2 3 b
3 3 4 c

文档说当[第一个参数]本身是一个data.table时,调用连接类似于base::merge,但对排序的键使用二分搜索."显然情况并非如此.我可以使用 data.tables 将 y 中的其他列转换为 x[y] 的结果吗?似乎只是取了 x 的行,其中键与 y 的键匹配,但完全忽略了 y 的其余部分...

The docs say "When [the first argument] is itself a data.table, a join is invoked similar to base::merge but uses binary search on the sorted key." Clearly this is not the case. Can I get the other columns from y into the result of x[y] with data.tables? It seems like it's just taking the rows of x where the key matches the key of y, but ignoring the rest of y entirely...

推荐答案

您引用了错误的文档部分.如果您查看 [.data.table 的文档,您将阅读:

You are quoting the wrong part of documentation. If you have a look at the doc of [.data.table you will read:

当 i 是一个 data.table 时,x 必须有一个键,表示将 i 连接到 x 并 返回x 中匹配的行.一个等值连接在 i 中的每一列之间执行按顺序到 x 的键中的每一列.这类似于基础 R子集矩阵的功能由一个 2 列矩阵,并在更高维度子集 n 维由 n 列矩阵组成的数组

When i is a data.table, x must have a key, meaning join i to x and return the rows in x that match. An equi-join is performed between each column in i to each column in x’s key in order. This is similar to base R functionality of sub- setting a matrix by a 2-column matrix, and in higher dimensions subsetting an n-dimensional array by an n-column matrix

我承认对包的描述(你引用的部分)有些混乱,因为它似乎是说可以使用["操作代替合并.但我认为它说的是:如果 x 和 y 都是 data.tables,我们在索引上使用连接(它像合并一样调用)而不是二分查找.

I admit the description of the package (the part you quoted) is somewhat confusing, because it seems to say that the "["-operation can be used instead of merge. But I think what it says is: if x and y are both data.tables we use a join on an index (which is invoked like merge) instead of binary search.

还有一件事:

我通过 install.packages 安装的 data.table 库缺少 merge.data.table 方法,所以使用 merge 会调用merge.data.frame.安装 R-Forge 包 后,R 使用了更快的 merge.data.table 方法.

The data.table library I installed via install.packages was missing the merge.data.table method, so using merge would call merge.data.frame. After installing the package from R-Forge R used the faster merge.data.table method.

您可以通过检查以下输出来检查您是否有 merge.data.table 方法:

You can check if you have the merge.data.table method by checking the output of:

methods(generic.function="merge")

<小时>

编辑 [答案不再有效]:此答案指的是 data.table 1.3 版.在 1.5.3 版本中,data.table 的行为发生了变化,并且 x[y] 返回了预期的结果.感谢 data.table 的作者 Matthew Dowle 在评论中指出这一点.


EDIT [Answer no longer valid]: This answer refers to data.table version 1.3. In version 1.5.3 the behaviour of data.table changed and x[y] returns the expected results. Thank you Matthew Dowle, author of data.table, for pointing this out in the comments.

这篇关于如何进行 data.table 合并操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆