如何做一个基本的左外连接与R中的data.table? [英] How to do a basic left outer join with data.table in R?

查看:86
本文介绍了如何做一个基本的左外连接与R中的data.table?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有a和b的data.table,我已经划分到下面和b< .5和上面(b> .5):

I have a data.table of a and b that I've partitioned into below with b < .5 and above with b > .5:

DT = data.table(a=as.integer(c(1,1,2,2,3,3)), b=c(0,0,0,1,1,1))
above = DT[DT$b > .5]
below = DT[DT$b < .5, list(a=a)]

我想做一个左外连接以上下面:每个 a $ c>上面,计算下面的行数。这等同于SQL中的以下内容:

I'd like to do a left outer join between above and below: for each a in above, count the number of rows in below. This is equivalent to the following in SQL:

with dt as (select 1 as a, 0 as b union select 1, 0 union select 2, 0 union select 2, 1 union select 3, 1 union select 3, 1),
  above as (select a, b from dt where b > .5),
  below as (select a, b from dt where b < .5)
select above.a, count(below.a) from above left outer join below on (above.a = below.a) group by above.a;
 a | count 
---+-------
 3 |     0
 2 |     1
(2 rows)

如何用data.tables完成同样的事情?这是我到目前为止所尝试的:

How do I accomplish the same thing with data.tables? This is what I tried so far:

> key(below) = 'a'
> below[above, list(count=length(b))]
     a count
[1,] 2     1
[2,] 3     1
[3,] 3     1
> below[above, list(count=length(b)), by=a]
Error in eval(expr, envir, enclos) : object 'b' not found
> below[, list(count=length(a)), by=a][above]
     a count b
[1,] 2     1 1
[2,] 3    NA 1
[3,] 3    NA 1



我还应该更具体, code> merge 但是,我的系统上的内存(和数据集只占大约20%的内存)。

I should also be more specific in that I already tried merge but that blows through the memory on my system (and the dataset takes only about 20% of my memory).

推荐答案

看看这是给你一些有用的东西。你的示例太稀疏,让我知道你想要什么,但它似乎可能是上面$ a 的值的表格,也在 $ a以下$ a以下$ a以下$ a以下$ a

See if this is giving you something useful. Your example is too sparse to let me know what you want, but it appears it might be a tabulation of values of above$a that are also in below$a

table(above$a[above$a %in% below$a])

如果您还希望 / code>,那么这样做:

If you also want the converse ... values not in below, then this would do it:

table(above$a[!above$a %in% below$a])

您可以将它们连接起来:

And you can concatenate them:

> c(table(above$a[above$a %in% below$a]),table(above$a[!above$a %in% below$a]) )
2 3 
1 2

通常%in%以相当小的足迹运行,速度很快。

Generally table and %in% run in reasonably small footprints and are quick.

这篇关于如何做一个基本的左外连接与R中的data.table?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆