如何做一个基本的左外连接与R中的data.table? [英] How to do a basic left outer join with data.table in R?
问题描述
我有a和b的data.table,我已经划分到下面
和b< .5和上面
(b> .5):
I have a data.table of a and b that I've partitioned into below
with b < .5 and above
with b > .5:
DT = data.table(a=as.integer(c(1,1,2,2,3,3)), b=c(0,0,0,1,1,1))
above = DT[DT$b > .5]
below = DT[DT$b < .5, list(a=a)]
我想做一个左外连接以上
和下面
:每个 a
$ c>上面,计算下面的行数
。这等同于SQL中的以下内容:
I'd like to do a left outer join between above
and below
: for each a
in above
, count the number of rows in below
. This is equivalent to the following in SQL:
with dt as (select 1 as a, 0 as b union select 1, 0 union select 2, 0 union select 2, 1 union select 3, 1 union select 3, 1),
above as (select a, b from dt where b > .5),
below as (select a, b from dt where b < .5)
select above.a, count(below.a) from above left outer join below on (above.a = below.a) group by above.a;
a | count
---+-------
3 | 0
2 | 1
(2 rows)
如何用data.tables完成同样的事情?这是我到目前为止所尝试的:
How do I accomplish the same thing with data.tables? This is what I tried so far:
> key(below) = 'a'
> below[above, list(count=length(b))]
a count
[1,] 2 1
[2,] 3 1
[3,] 3 1
> below[above, list(count=length(b)), by=a]
Error in eval(expr, envir, enclos) : object 'b' not found
> below[, list(count=length(a)), by=a][above]
a count b
[1,] 2 1 1
[2,] 3 NA 1
[3,] 3 NA 1
我还应该更具体, code> merge 但是,我的系统上的内存(和数据集只占大约20%的内存)。
I should also be more specific in that I already tried merge
but that blows through the memory on my system (and the dataset takes only about 20% of my memory).
推荐答案
看看这是给你一些有用的东西。你的示例太稀疏,让我知道你想要什么,但它似乎可能是上面$ a
的值的表格,也在 $ a以下$ a以下$ a以下$ a以下$ a
See if this is giving you something useful. Your example is too sparse to let me know what you want, but it appears it might be a tabulation of values of above$a
that are also in below$a
table(above$a[above$a %in% below$a])
如果您还希望 / code>,那么这样做:
If you also want the converse ... values not in below
, then this would do it:
table(above$a[!above$a %in% below$a])
您可以将它们连接起来:
And you can concatenate them:
> c(table(above$a[above$a %in% below$a]),table(above$a[!above$a %in% below$a]) )
2 3
1 2
通常表
和%in%
以相当小的足迹运行,速度很快。
Generally table
and %in%
run in reasonably small footprints and are quick.
这篇关于如何做一个基本的左外连接与R中的data.table?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!