data.table 多列合并 [英] data.table merge by multiple columns

查看：17 发布时间：2022/1/13 19:40:33 r merge data.table

本文介绍了data.table 多列合并的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试成对匹配 4 个变量并添加具有查找值的列.在基地，我会做 merge(df1,df2, by.x=c("lsr","ppr"),by.y=c("li","pro")),其中 df1 有 9 个 cols 和 df2(2 个是 lsr 和 pro)df2只有 3 个，li，pro，以及我感兴趣的值"，alpha.

I'm trying to match 4 variables pairwise and add a column with the lookup value. In base, I would do merge(df1,df2, by.x=c("lsr","ppr"),by.y=c("li","pro")), where df1 has 9 cols and df2 (2 being lsr and pro) df2 has only 3, li, pro, and the "value" I'm interested in, alpha.

这很好用，但是当我开始成为 data.table 的忠实粉丝时，我想以 data.table 的方式执行此操作 -并且因为我有几百万行 - 所以基本合并很慢(我看到 by.x 和 by.y 功能正在等待 data.table，但也许有一种解决方法).请参阅下面的一些示例数据:

This works fine, but as I'm beginning to be a huge fan of data.table, I would like to do this in the data.table way - and because I have some millions of rows - so base merge is slow (I saw, that the by.x, and by.y feature is pending for data.table, but maybe there is a workaround). See some sample data below:

df2:
         alpha         li        pro
      1: 0.5000000 0.01666667 0.01666667
      2: 0.3295455 0.03333333 0.01666667
      3: 0.2435897 0.05000000 0.01666667
      4: 0.1917808 0.06666667 0.01666667
      5: 0.1571429 0.08333333 0.01666667
df1:     
          demand rtime    mcv         mck        ppr       mlv         mlk        lsr
      1:    0.3     1 357.57700 0.099326944 0.01666667 558.27267 0.155075741 0.01666667
      2:    0.3    10 548.75433 0.152431759 0.01666667 614.30667 0.170640741 0.03333333
      3:    0.3    11 314.55767 0.087377130 0.01666667 636.48100 0.176800278 0.03333333
      4:    0.3     2 312.15033 0.086708426 0.01666667 677.48100 0.188189167 0.06666667
      5:    0.3     3 454.47867 0.126244074 0.01666667 608.92067 0.169144630 0.01666667
     ---                                                                               
6899196:    0.6     5 537.92673 0.149424093 1.00000000 537.92673 0.149424093 1.00000000
6899197:    0.6     6 277.34732 0.077040923 1.00000000 277.34732 0.077040923 1.00000000
6899198:    0.6     7  73.31484 0.020365235 1.00000000  73.31484 0.020365235 1.00000000
6899199:    0.6     8  32.04197 0.008900546 1.00000000  32.04197 0.008900546 1.00000000
6899200:    0.6     9  14.59008 0.004052799 1.00000000  14.59008 0.004052799 1.00000000

最后，也许有趣的是，在 df2 中我有唯一的行，而在 df1 中，我有很多关于 lsr 和 ppr.我还尝试设置两个键并以 data.table 方式加入它们，并使用 alpha 添加一个新列.但没有成功.


Last, maybe of interest is, that in df2 I have unique rows, and in df1, I have lots of duplicates in respect to lsr and ppr. I also tried to set two keys and join them the data.table way, and adding a new column with alpha. But without success.
推荐答案
您可以使用 David Arenburg 在评论中提供的声明:
You can use the statement provided by David Arenburg in comment:

setkey(df1, lsr, ppr)
setkey(df2, li, pro)
df1[df2, alpha := i.alpha]

<小时>
从当前开发版本，1.9.5，我们可以执行连接无需使用 on 参数直接设置键:




From the current devel version, 1.9.5, we can perform joins directly without having to set keys using the on argument: 

df1[df2, alpha := i.alpha, on = c(lsr="li", ppr="pro")]

如果你不想安装devel版本，那么你可以等到CRAN上作为v1.9.6推送.
If you don't want to install the devel version, then you can wait until this is pushed as v1.9.6 on CRAN.

                        这篇关于data.table 多列合并的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

data.table 多列合并 [英] data.table merge by multiple columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

data.table 多列合并 [英] data.table merge by multiple columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭