如何做一个data.table合并操作 [英] How to do a data.table merge operation

查看:75
本文介绍了如何做一个data.table合并操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

注意:此问题和以下答案是指data.table版本< 1.5.3; v。1.5.3于2011年2月发布,以解决此问题。查看更多最近的治疗(03-2012):

note: this question and the following answers refer to data.table versions < 1.5.3; v. 1.5.3 was released in Feb 2011 to resolve this issue. see more recent treatment (03-2012): Translating SQL joins on foreign keys to R data.table syntax

我一直在挖掘 data.table包(替换data.frame,对于某些操作更有效率),包括 Josh Reich在NYC R Meetup上的SQL和data.table的演示(pdf),但不能将这个完全琐碎的操作。

I've been digging through the documentation for the data.table package (a replacement for data.frame that's much more efficient for certain operations), including Josh Reich's presentation on SQL and data.table at the NYC R Meetup (pdf), but can't figure this totally trivial operation out.

> x <- DT(a=1:3, b=2:4, key='a')
> x
     a b
[1,] 1 2
[2,] 2 3
[3,] 3 4
> y <- DT(a=1:3, c=c('a','b','c'), key='a')
> y
     a c
[1,] 1 a
[2,] 2 b
[3,] 3 c
> x[y]
     a b
[1,] 1 2
[2,] 2 3
[3,] 3 4
> merge(x,y)
  a b c
1 1 2 a
2 2 3 b
3 3 4 c

文档说当[第一个参数]本身是一个data.table时,一个连接类似于base :: merge被调用,但是使用二进制搜索排序的键。显然不是这样的。我可以从y到其他列x [y]与data.tables的结果吗?看起来它只是取x的行,其中键匹配y的键,但忽略了y的其余部分...

The docs say "When [the first argument] is itself a data.table, a join is invoked similar to base::merge but uses binary search on the sorted key." Clearly this is not the case. Can I get the other columns from y into the result of x[y] with data.tables? It seems like it's just taking the rows of x where the key matches the key of y, but ignoring the rest of y entirely...

推荐答案

您引用了错误的文档部分。如果你看看 [。data.table 的文档,你会看到:

You are quoting the wrong part of documentation. If you have a look at the doc of [.data.table you will read:


当i是data.table时,x必须有
键,这意味着将i连接到x,并且返回
匹配x中的行
。一个equi-join
在i
的每个列之间按顺序在x的键中的每个列执行。
这类似于基本R
通过2列矩阵子矩阵
的功能,以及更高的
维度子集化n维
数组通过一个n列矩阵

When i is a data.table, x must have a key, meaning join i to x and return the rows in x that match. An equi-join is performed between each column in i to each column in x’s key in order. This is similar to base R functionality of sub- setting a matrix by a 2-column matrix, and in higher dimensions subsetting an n-dimensional array by an n-column matrix

我承认包的描述(你引用的部分)有点混乱,因为它似乎说可以使用[操作而不是合并。但我想它的意思是:如果x和y都是data.tables,我们使用一个索引(它被调用像merge)而不是二进制搜索。

I admit the description of the package (the part you quoted) is somewhat confusing, because it seems to say that the "["-operation can be used instead of merge. But I think what it says is: if x and y are both data.tables we use a join on an index (which is invoked like merge) instead of binary search.

另一件事:

我通过<$ c $安装的data.table库c> install.packages 缺少 merge.data.table方法,因此使用 merge 将调用 merge.data.frame 。安装 R-Forge 软件包后,使用较快的 merge.data.table 方法。

The data.table library I installed via install.packages was missing the merge.data.table method, so using merge would call merge.data.frame. After installing the package from R-Forge R used the faster merge.data.table method.

您可以通过检查以下输出来检查merge.data.table方法:

You can check if you have the merge.data.table method by checking the output of:

methods(generic.function="merge")






EDIT [答案不再有效]:此答案是指data.table版本1.3。在版本1.5.3中,data.table的行为更改,x [y]返回预期结果。感谢 Matthew Dowle ,data.table的作者,在评论中指出这一点。


EDIT [Answer no longer valid]: This answer refers to data.table version 1.3. In version 1.5.3 the behaviour of data.table changed and x[y] returns the expected results. Thank you Matthew Dowle, author of data.table, for pointing this out in the comments.

这篇关于如何做一个data.table合并操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆