R-通过合并和超过2个后缀来减少(或:如何合并多个数据框并跟踪列) [英] R - reduce with merge and more than 2 suffixes (or: how to merge multiple dataframes and keep track of columns)
问题描述
我正在尝试基于2列合并4个数据框,但是要跟踪一列源自哪个数据框。我在跟踪列时遇到了问题。
I'm trying to merge 4 dataframes based on 2 columns, but keep track of which dataframe a column originated from. I'm running into an issue at tracking the columns.
(请参阅dput(dfs)的结尾)
(see end of post of dput(dfs))
#df example (df1)
Name Color Freq
banana yellow 3
apple red 1
apple green 4
plum purple 8
#create list of dataframes
list.df <- list(df1, df2, df3, df4)
#merge dfs on column "Name" and "Color"
combo.df <- Reduce(function(x,y) merge(x,y, by = c("Name", "Color"), all = TRUE, accumulate=FALSE, suffixes = c(".df1", ".df2", ".df3", ".df4")), list.df)
这会产生以下警告:
警告消息:
在merge.data.frame(x,y,by = c( Name, Color),all = TRUE,:
列名'Freq.df1','Freq.df2'在结果中重复
Warning message: In merge.data.frame(x, y, by = c("Name", "Color"), all = TRUE, : column names ‘Freq.df1’, ‘Freq.df2’ are duplicated in the result
并输出此数据帧:
#combo df example
Name Color Freq.df1 Freq.df2 Freq.df1 Freq.df2
banana yellow 3 3 7 NA
apple red 1 2 9 1
apple green 4 NA 8 2
plum purple 8 1 NA 6
df1
和 df2
仅在名称上重复。填充 combo
第三和第四列的值实际上来自 df3
和 df4
df1
and df2
are only repeated in name. The values populating the third and fourth column of combo
are actually from df3
and df4
respectively.
我真正想要的是:
Name Color Freq.df1 Freq.df2 Freq.df3 Freq.df4
banana yellow 3 3 7 NA
apple red 1 2 9 1
apple green 4 NA 8 2
plum purple 8 1 NA 6
如何实现?我知道 merge(...,后缀)
函数只能处理2的字符向量,但是我不知道应该怎么做。
How can I achieve this? I know the merge(..., suffixes)
function can only handle a character vector of 2, but I don't know what the work around should be. Thanks!
df1 <-
structure(list(Name = structure(c(2L, 1L, 1L, 3L), .Label = c("apple",
"banana", "plum"), class = "factor"), Color = structure(c(4L,
3L, 1L, 2L), .Label = c("green", "purple", "red", "yellow"), class = "factor"),
Freq = c(3, 1, 4, 8)), .Names = c("Name", "Color", "Freq"
), row.names = c(NA, -4L), class = "data.frame")
df2 <-
structure(list(Name = structure(c(2L, 1L, 3L), .Label = c("apple",
"banana", "plum"), class = "factor"), Color = structure(c(3L,
2L, 1L), .Label = c("purple", "red", "yellow"), class = "factor"),
Freq = c(3, 2, 1)), .Names = c("Name", "Color", "Freq"), row.names = c(NA,
-3L), class = "data.frame")
df3 <-
structure(list(Name = structure(c(2L, 1L, 1L), .Label = c("apple",
"banana"), class = "factor"), Color = structure(c(3L, 2L, 1L), .Label = c("green",
"red", "yellow"), class = "factor"), Freq = c(7, 9, 8)), .Names = c("Name",
"Color", "Freq"), row.names = c(NA, -3L), class = "data.frame")
df4 <-
structure(list(Name = structure(c(1L, 1L, 2L), .Label = c("apple",
"plum"), class = "factor"), Color = structure(c(3L, 1L, 2L), .Label = c("green",
"purple", "red"), class = "factor"), Freq = c(1, 2, 6)), .Names = c("Name",
"Color", "Freq"), row.names = c(NA, -3L), class = "data.frame")
推荐答案
使用似乎更容易
循环为 Reduce
或 reduce
( purrr
)一次仅获取两个数据集,因此我们在合并
中不能有两个以上的后缀
This seems to be easier with a for
loop as the Reduce
or reduce
(purrr
) at a time takes only two datasets, so we can't have more than two suffixes
in the merge
.
在这里,我们创建了一个后缀向量('sfx')。使用第一个 list
元素初始化输出数据集。然后循环遍历 list.df的序列,并与 res和 list.df的下一个元素进行顺序的
merge
code>,同时在每个步骤中更新 res
Here, we created a vector of suffixes ('sfx'). Initialize an output dataset with the first list
element. Then loop through the sequence of 'list.df' and do a sequential merge
with the 'res' and the next element of list.df
while updating the 'res' in each step
sfx <- c(".df1", ".df2", ".df3", ".df4")
res <- list.df[[1]]
for(i in head(seq_along(list.df), -1)) {
res <- merge(res, list.df[[i+1]], all = TRUE,
suffixes = sfx[i:(i+1)], by = c("Name", "Color"))
}
res
# Name Color Freq.df1 Freq.df2 Freq.df3 Freq.df4
#1 apple green 4 NA 8 2
#2 apple red 1 2 9 1
#3 banana yellow 3 3 7 NA
#4 plum purple 8 1 NA 6
这篇关于R-通过合并和超过2个后缀来减少(或:如何合并多个数据框并跟踪列)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!