在R data.table中连接具有相同(非键控)列名的表 [英] Joining tables with identical (non-keyed) column names in R data.table

查看:153
本文介绍了在R data.table中连接具有相同(非键控)列名的表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在连接data.tables时,如何处理相同命名的非键列?我正在寻找 table.field 在SQL中的解法。

How do you deal with identically named, non-key columns when joining data.tables? I am looking for a solution to table.field notation in SQL.

例如,每个时间段用列 v 的新数据重新填充的表DT。我还有一个表DT_HIST,用于存储以前时间段( t )的条目。我想查找每个 x

For instance, lets' say I have a table DT that is repopulated with new data for column v every time period. I also have a table DT_HIST that stores entries from previous time periods (t). I want to find the difference between the current and previous time period for each x

的当前时间段和上一个时间段之间的差异在这种情况下:DT时间段3,并且DT_HIST具有时间段1和2:

In this case: DT is time period 3, and DT_HIST has time periods 1 and 2:

 DT <- data.table(x=c(1,2,3,4),v=c(20,20,35,30))
 setkey(DT,x)
 DT_HIST <- data.table(x=rep(seq(1,4,1),2),v=c(40,40,40,40,30,25,45,40),t=c(rep(1,4),rep(2,4)))
 setkey(DT_HIST,x)

> DT
   x  v
1: 1 20
2: 2 20
3: 3 35
4: 4 30

> DT_HIST
   x  v t
1: 1 40 1
2: 1 30 2
3: 2 40 1
4: 2 25 2
5: 3 40 1
6: 3 45 2
7: 4 40 1
8: 4 40 2


b $ b

我想加入 DT DT_HIST [t == 1,] x 并计算 v 中的差异。

I would like to join DT with DT_HIST[t==1,] on x and calculate the difference in v.

表导致列 v v.1

> DT[DT_HIST[t==2],]
   x  v v.1 t
1: 1 20  30 2
2: 2 20  25 2
3: 3 35  45 2
4: 4 30  40 2

但是,我找不到一种方法在进行连接时,不同的 v 列。

However, I can't find a way to refer to the different v columns when doing the join.

> DT[DT_HIST[t==2],list(delta=v-v.1)]
Error in `[.data.table`(DT, DT_HIST[t == 2], list(delta = v - v.1)) : 
object 'v.1' not found

> DT[DT_HIST[t==2],list(delta=v-v)]
   x delta
1: 1     0
2: 2     0
3: 3     0
4: 4     0

如果这是重复,我表示歉意。我搜索,也找不到类似的问题。

If this is a duplicate, I apologize. I searched and couldn't find a similar question.

此外,我意识到,我可以简单地重命名加入后的列,然后运行我想要的表达式,但我想

Also, I realize that I can simply rename the columns after joining and then run my desired expression, but I want to know if I'm doing this in the completely wrong way.

推荐答案

您可以使用 i.colname 以访问 i -expression data.table 中的列。我看到您使用的是旧的 data.table 版本。自从那以后有一些变化:重复的连接列名称有一个前缀 i。而不是数字后缀(使它更符合

You can use i.colname to access the column in the i-expression data.table. I see you're using an old data.table version. There have been a few changes since then: the duplicated joined column names have a prefix i. instead of a number postfix (making it more consistent with the i. access of joined column names, and there is no by-without-by anymore by default.

在最新版本(1.9。)中, 3),这是您得到的:

In the latest version (1.9.3), this is what you get:

DT[DT_HIST[t==2],list(delta = v - i.v)]
#   delta
#1:   -10
#2:    -5
#3:   -10
#4:   -10

这篇关于在R data.table中连接具有相同(非键控)列名的表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆