R Data.Table 条件连接 [英] R Data.Table Join on Conditionals

查看:22
本文介绍了R Data.Table 条件连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个表,我想以与以下 SQL 相同的方式将它们连接在一起,其中我在多个条件下连接,而不仅仅是相等.

I have two tables that I would like to join together in a way equivalent to the following SQL, where I join on multiple conditions, not just equality.

require(sqldf)
require(data.table)

dt <- data.table(num=c(1, 2, 3, 4, 5, 6), 
char=c('A', 'A', 'A', 'B', 'B', 'B'), 
bool=c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE))

dt_two <- data.table(
num =c(6, 1, 5, 2, 4, 3), 
char=c('A', 'A', 'A', 'B', 'B', 'B'), 
bool=c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE))


dt_out_sql <- sqldf('
    select dtone.num, dtone.char, dtone.bool, SUM(dttwo.num) as SUM,  
   MIN(dttwo.num) as MIN
    from dt as dtone INNER join dt_two as dttwo on 
    (dtone.char = dttwo.char) and 
    (dtone.num >= dttwo.num OR dtone.bool)
GROUP BY dtone.num, dtone.char, dtone.bool')

出于性能和灵活性的原因,我想避免使用 SQL 解决方案.进行交叉连接,然后进行过滤/聚合也是如此——它会创建一个中间表,其中包含许多不必要的记录供我过滤掉.

I would like to avoid the SQL solution, for both performance and flexibility reasons. The same goes for doing a cross join, and then filtering/aggregating -- it would create an intermediate table with lots of unnecessary records for me to filter out.

非常感谢!

更新——我最初的例子是匆忙完成的.在我的实际问题中,我没有进行自我加入.

Update -- my initial example was done in haste. In my actual problem, I'm not doing a self join.

推荐答案

有点难看但是很管用:

library(data.table)
library(sqldf)

dt <- data.table(num=c(1, 2, 3, 4, 5, 6), 
                 char=c('A', 'A', 'A', 'B', 'B', 'B'), 
                 bool=c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE))

dt_two <- data.table(
  num =c(6, 1, 5, 2, 4, 3), 
  char=c('A', 'A', 'A', 'B', 'B', 'B'), 
  bool=c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE))


dt_out_sql <- sqldf('
    select dtone.num,
            dtone.char,
            dtone.bool,
            SUM(dttwo.num) as SUM,  
            MIN(dttwo.num) as MIN
    from    dt as dtone
    INNER join dt_two as dttwo on 
          (dtone.char = dttwo.char) and 
          (dtone.num >= dttwo.num OR dtone.bool)
    GROUP BY dtone.num, dtone.char, dtone.bool
  ')

setDT(dt_out_sql)

setkey(dt, char)
setkey(dt_two, char)

dt_out_r <- dt[dt_two,
               list(dtone.num = num,
                    dttwo.num = i.num,
                    char,
                    bool) ,
               nomatch = 0, allow.cartesian = T
               ][
                 dtone.num >= dttwo.num | bool,
                 list(SUM = sum(dttwo.num),
                      MIN = min(dttwo.num)),
                 by = list(num = dtone.num,
                           char,
                           bool)
                 ]

setkey(dt_out_r, num, char, bool)


all.equal(dt_out_sql, dt_out_r, check.attributes = FALSE)

这篇关于R Data.Table 条件连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆