如何按不同的列名合并两个data.table? [英] How to merge two data.table by different column names?

查看：13 发布时间：2021/12/27 22:08:39 r merge data.table

本文介绍了如何按不同的列名合并两个data.table?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个 data.table X 和 Y.

I have two data.table X and Y.

X 中的列:area、id、value
Y 中的列:ID、价格、销售额

columns in X: area, id, value
columns in Y: ID, price, sales

创建两个数据表:

X = data.table(area=c('US', 'UK', 'EU'),
               id=c('c001', 'c002', 'c003'),
               value=c(100, 200, 300)
              )

Y = data.table(ID=c('c001', 'c002', 'c003'),
               price=c(500, 200, 400),
               sales=c(20, 30, 15)
              )

然后我为 X 和 Y 设置了键:

And I set keys for X and Y:

setkey(X, id)
setkey(Y, ID)

现在我尝试在X和ID中通过id连接X和YY 中的代码>:


Now I try to join X and Y by id in X and ID in Y:
merge(X, Y)
merge(X, Y, by=c('id', 'ID'))
merge(X, Y, by.x='id', by.y='ID')

所有引发的错误表明 by 参数中的列名无效.
All raised error saying that column names in the by argument invalid.
我参考了data.table的手册，发现merge函数不支持by.x和by.y参数.
I referred to the manual of data.table and found the merge function not supporting by.x and by.y arguments.
如何在不更改列名的情况下通过不同的列名连接两个 data.tables?
How could I join two data.tables by different column names without changing the column names?
附加:
我设法通过 X[Y] 连接了两个表，但是为什么 merge 函数在 data.table 中失败了?
Append:

I managed to join the two tables by X[Y], but why merge function fails in data.table? 
推荐答案
OUTDATED
使用这个操作:
OUTDATED

Use this operation:
X[Y]
#    area   id value price sales
# 1:   US c001   100   500    20
# 2:   UK c002   200   200    30
# 3:   EU c003   300   400    15

或者这个操作:
Y[X]
#      ID price sales area value
# 1: c001   500    20   US   100
# 2: c002   200    30   UK   200
# 3: c003   400    15   EU   300

编辑 在您编辑您的问题后，我阅读了 FAQ:X[Y] 和 merge(X,Y) 之间的区别是什么?"，这让我结帐 ?merge 并且我发现有两种不同的合并功能，具体取决于您使用的包.默认为 merge.data.frame 但 data.table 使用 merge.data.table.比较
Edit after you edited your question, I read Section 1.12 of the FAQ: "What is the didifference between X[Y] and merge(X,Y)?", which led me to checkout ?merge and I discovered there are two different merge functions depending upon which package you are using. The default is merge.data.frame but data.table uses merge.data.table. Compare
merge(X, Y, by.x = "id", by.y = "ID") # which is merge.data.table
# Error in merge.data.table(X, Y, by.x = "id", by.y = "ID") : 
# A non-empty vector of column names for `by` is required.

与
merge.data.frame(X, Y, by.x = "id", by.y = "ID")
#     id area value price sales
# 1 c001   US   100   500    20
# 2 c002   UK   200   200    30
# 3 c003   EU   300   400    15

编辑完整性基于评论 by @Michael Bernsteiner，看起来 data.table 团队正在计划将 by.x 和 by.y 实现到merge.data.table 函数，但还没有这样做.
Edit for completeness based upon a comment by @Michael Bernsteiner, it looks like the data.table team is planning on implementing by.x and by.y into the merge.data.table function, but hasn't done so yet.

                        这篇关于如何按不同的列名合并两个data.table?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何按不同的列名合并两个data.table? [英] How to merge two data.table by different column names?

问题描述

推荐答案

OUTDATED

OUTDATED

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何按不同的列名合并两个data.table? [英] How to merge two data.table by different column names?

问题描述

推荐答案

OUTDATED

OUTDATED

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭