R data.table join:SQL“select *”类似的语法在连接表? [英] R data.table join: SQL "select *" alike syntax in joined tables?

查看:108
本文介绍了R data.table join:SQL“select *”类似的语法在连接表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个data.tables有许多字段。



我想连接两个表,添加一些计算字段,并追加所有其他字段从第一,第二或者两个表(类似于SQL select a + b AS sum,DT1。*,DT2。* FROM ...



简化的示例数据:

  library(data.table)
DT1 = data.table(x = c(c,a,b a,b),a = 1:5)
DT2 = data.table(x = c(d,c,b),b = 6:8)

现在我想加入表格并添加一个计算字段:

  DT1 [DT2,。(sum = a + b,<<如何说DT1。*,DT2。* here?> ),on =x] 


受用户jangorecki的启发我找到了一个功能请求:



应该能够在加入期间引用我的.SD。



它使用非常强大的R功能,称为计算(或元编程),在官方
R语言定义手册。这是R语言的一个特殊功能,不应忘记IMO。



  library(data.table)
DT1 = data.table(x = c(c,a,b,a,b),a = 1:5 )
DT2 = data.table(x = c(d,c,b),b = 6:8)

jj = as.call
list(as.name(。)),
list(sum = quote(a + b)),
lapply ))),as.name)
))
print(jj)
#(sum = a + b,x,a,b)
DT1 (jj),on =x]
#sum xab
#1:NA d NA 6
#2:8 c 1 7
#3:11 b 3 8
#4:13 b 5 8


I have two data.tables with many fields.

I want to join the two tables, add some calculated fields and append all other fields from the first, second or both tables (similar to SQL's select a+b AS sum, DT1.*, DT2.* FROM...) without typing all the field names.

How can I do this (regarding easiest syntax and best performance)?

Simplified example data:

library(data.table)
DT1 = data.table(x=c("c", "a", "b", "a", "b"), a=1:5)
DT2 = data.table(x=c("d", "c", "b"), b=6:8)

Now I want to join the tables and add a calculated field:

DT1[DT2, .(sum=a + b, <<< how to say DT1.*, DT2.* here? >>> ), on="x"]

Update May 4, 2016: Inspired by user jangorecki I have found a feature request for this:

Should be able to refer to i's .SD during a join

解决方案

This should precisely answer your need.
It uses very powerful R feature called computing on the language (or meta programming) well described in official R Language Definition manual. This is an exceptional feature of R language and should not be forgotten IMO.

library(data.table)
DT1 = data.table(x=c("c", "a", "b", "a", "b"), a=1:5)
DT2 = data.table(x=c("d", "c", "b"), b=6:8)

jj = as.call(c(
    list(as.name(".")),
    list(sum = quote(a+b)),
    lapply(unique(c(names(DT1), names(DT2))), as.name)
))
print(jj)
#.(sum = a + b, x, a, b)
DT1[DT2, eval(jj), on="x"]
#   sum x  a b
#1:  NA d NA 6
#2:   8 c  1 7
#3:  11 b  3 8
#4:  13 b  5 8

这篇关于R data.table join:SQL“select *”类似的语法在连接表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆