data.table通过多列合并 [英] data.table merge by multiple columns
问题描述
我试图成对匹配4个变量,并添加具有查询值的列.在基础上,我会做merge(df1,df2, by.x=c("lsr","ppr"),by.y=c("li","pro"))
,其中df1
有9个列和df2
(2个是lsr
和pro
)df2
只有3个,li
,pro
,而值",alpha
是我感兴趣的值.
I'm trying to match 4 variables pairwise and add a column with the lookup value. In base, I would do merge(df1,df2, by.x=c("lsr","ppr"),by.y=c("li","pro"))
, where df1
has 9 cols and df2
(2 being lsr
and pro
) df2
has only 3, li
, pro
, and the "value" I'm interested in, alpha
.
这很好用,但是当我开始成为data.table
的忠实拥护者时,我想以data.table
的方式进行操作-因为我有数百万行,所以基本合并很慢(我看到by.x
和by.y
功能正在等待data.table
使用,但也许有解决方法).请参阅下面的一些示例数据:
This works fine, but as I'm beginning to be a huge fan of data.table
, I would like to do this in the data.table
way - and because I have some millions of rows - so base merge is slow (I saw, that the by.x
, and by.y
feature is pending for data.table
, but maybe there is a workaround). See some sample data below:
df2:
alpha li pro
1: 0.5000000 0.01666667 0.01666667
2: 0.3295455 0.03333333 0.01666667
3: 0.2435897 0.05000000 0.01666667
4: 0.1917808 0.06666667 0.01666667
5: 0.1571429 0.08333333 0.01666667
df1:
demand rtime mcv mck ppr mlv mlk lsr
1: 0.3 1 357.57700 0.099326944 0.01666667 558.27267 0.155075741 0.01666667
2: 0.3 10 548.75433 0.152431759 0.01666667 614.30667 0.170640741 0.03333333
3: 0.3 11 314.55767 0.087377130 0.01666667 636.48100 0.176800278 0.03333333
4: 0.3 2 312.15033 0.086708426 0.01666667 677.48100 0.188189167 0.06666667
5: 0.3 3 454.47867 0.126244074 0.01666667 608.92067 0.169144630 0.01666667
---
6899196: 0.6 5 537.92673 0.149424093 1.00000000 537.92673 0.149424093 1.00000000
6899197: 0.6 6 277.34732 0.077040923 1.00000000 277.34732 0.077040923 1.00000000
6899198: 0.6 7 73.31484 0.020365235 1.00000000 73.31484 0.020365235 1.00000000
6899199: 0.6 8 32.04197 0.008900546 1.00000000 32.04197 0.008900546 1.00000000
6899200: 0.6 9 14.59008 0.004052799 1.00000000 14.59008 0.004052799 1.00000000
最后,也许有趣的是,在df2
中我有唯一的行,而在df1
中我有很多关于lsr
和ppr
的重复项.我还尝试设置两个键并将它们以data.table
方式加入,并使用alpha
添加新列.但是没有成功.
Last, maybe of interest is, that in df2
I have unique rows, and in df1
, I have lots of duplicates in respect to lsr
and ppr
. I also tried to set two keys and join them the data.table
way, and adding a new column with alpha
. But without success.
推荐答案
您可以在注释中使用David Arenburg提供的语句:
You can use the statement provided by David Arenburg in comment:
setkey(df1, lsr, ppr)
setkey(df2, li, pro)
df1[df2, alpha := i.alpha]
从当前开发版本1.9.5 中,我们可以执行联接无需使用on
参数直接设置键:
From the current devel version, 1.9.5, we can perform joins directly without having to set keys using the on
argument:
df1[df2, alpha := i.alpha, on = c(lsr="li", ppr="pro")]
如果您不想安装devel版本,则可以等到在CRAN上将其作为v1.9.6推送.
If you don't want to install the devel version, then you can wait until this is pushed as v1.9.6 on CRAN.
这篇关于data.table通过多列合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!