如何合并两个data.table不同的列名? [英] How to merge two data.table by different column names?
问题描述
我有两个data.table X 和 Y 。
I have two data.table X and Y.
strong>:区域,ID,值
Y 列: ID,
columns in X: area, id, value
columns in Y: ID, price, sales
创建两个data.tables:
Create the two data.tables:
X = data.table(area=c('US', 'UK', 'EU'),
id=c('c001', 'c002', 'c003'),
value=c(100, 200, 300)
)
Y = data.table(ID=c('c001', 'c002', 'c003'),
price=c(500, 200, 400),
sales=c(20, 30, 15)
)
我为 X 和 Y 设置键:
setkey(X, id)
setkey(Y, ID)
现在,我尝试通过 X 和 加入 id
加入 X Y 中的code> ID :
Now I try to join X and Y by id
in X and ID
in Y:
merge(X, Y)
merge(X, Y, by=c('id', 'ID'))
merge(X, Y, by.x='id', by.y='ID')
所有引发的错误表示 by
参数无效。
All raised error saying that column names in the by
argument invalid.
我参考data.table的手册,发现 merge
支持 by.x
和 by.y
参数。
I referred to the manual of data.table and found the merge
function not supporting by.x
and by.y
arguments.
附加:如何通过不同的列名称连接两个数据表<
我设法通过 X [Y]
连接两个表,但为什么合并
函数在data.table失败?
Append:
I managed to join the two tables by X[Y]
, but why merge
function fails in data.table?
推荐答案
使用此操作:
X[Y]
# area id value price sales
# 1: US c001 100 500 20
# 2: UK c002 200 200 30
# 3: EU c003 300 400 15
或此操作:
Y[X]
# ID price sales area value
# 1: c001 500 20 US 100
# 2: c002 200 30 UK 200
# 3: c003 400 15 EU 300
编辑您的问题,我阅读了常见问题的第1.12节:什么是X [Y]和合并(X,Y)之间的差异?,这导致我checkout ?merge
,我发现有两个不同的合并函数,您正在使用的包。默认为 merge.data.frame
,但data.table使用 merge.data.table
。比较
Edit after you edited your question, I read Section 1.12 of the FAQ: "What is the didifference between X[Y] and merge(X,Y)?", which led me to checkout ?merge
and I discovered there are two different merge functions depending upon which package you are using. The default is merge.data.frame
but data.table uses merge.data.table
. Compare
merge(X, Y, by.x = "id", by.y = "ID") # which is merge.data.table
# Error in merge.data.table(X, Y, by.x = "id", by.y = "ID") :
# A non-empty vector of column names for `by` is required.
merge.data.frame(X, Y, by.x = "id", by.y = "ID")
# id area value price sales
# 1 c001 US 100 500 20
# 2 c002 UK 200 200 30
# 3 c003 EU 300 400 15
根据评论编辑完整性团队正在计划实施 by.x
和<@ c $ c> code> into.y 到 merge.data.table
函数中,但尚未这样做。
Edit for completeness based upon a comment by @Michael Bernsteiner, it looks like the data.table
team is planning on implementing by.x
and by.y
into the merge.data.table
function, but hasn't done so yet.
这篇关于如何合并两个data.table不同的列名?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!