R-通过合并和超过2个后缀来减少(或:如何合并多个数据框并跟踪列) [英] R - reduce with merge and more than 2 suffixes (or: how to merge multiple dataframes and keep track of columns)

查看:89
本文介绍了R-通过合并和超过2个后缀来减少(或:如何合并多个数据框并跟踪列)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试基于2列合并4个数据框,但是要跟踪一列源自哪个数据框。我在跟踪列时遇到了问题。

I'm trying to merge 4 dataframes based on 2 columns, but keep track of which dataframe a column originated from. I'm running into an issue at tracking the columns.

(请参阅dput(dfs)的结尾)

(see end of post of dput(dfs))

#df example (df1)
Name    Color    Freq
banana  yellow   3
apple   red      1
apple   green    4
plum    purple   8


#create list of dataframes
list.df <- list(df1, df2, df3, df4)

#merge dfs on column "Name" and "Color"
combo.df <- Reduce(function(x,y) merge(x,y, by = c("Name", "Color"), all = TRUE, accumulate=FALSE, suffixes = c(".df1", ".df2", ".df3", ".df4")), list.df)

这会产生以下警告:


警告消息:
在merge.data.frame(x,y,by = c( Name, Color),all = TRUE,:
列名'Freq.df1','Freq.df2'在结果中重复

Warning message: In merge.data.frame(x, y, by = c("Name", "Color"), all = TRUE, : column names ‘Freq.df1’, ‘Freq.df2’ are duplicated in the result

并输出此数据帧:

#combo df example
Name    Color    Freq.df1   Freq.df2  Freq.df1  Freq.df2
banana  yellow   3          3         7         NA
apple   red      1          2         9         1
apple   green    4          NA        8         2
plum    purple   8          1         NA        6

df1 df2 仅在名称上重复。填充 combo 第三和第四列的值实际上来自 df3 df4

df1 and df2 are only repeated in name. The values populating the third and fourth column of combo are actually from df3 and df4 respectively.

我真正想要的是:

Name    Color    Freq.df1   Freq.df2  Freq.df3  Freq.df4
banana  yellow   3          3         7         NA
apple   red      1          2         9         1
apple   green    4          NA        8         2
plum    purple   8          1         NA        6

如何实现?我知道 merge(...,后缀)函数只能处理2的字符向量,但是我不知道应该怎么做。

How can I achieve this? I know the merge(..., suffixes) function can only handle a character vector of 2, but I don't know what the work around should be. Thanks!

df1 <- 
structure(list(Name = structure(c(2L, 1L, 1L, 3L), .Label = c("apple", 
"banana", "plum"), class = "factor"), Color = structure(c(4L, 
3L, 1L, 2L), .Label = c("green", "purple", "red", "yellow"), class = "factor"), 
    Freq = c(3, 1, 4, 8)), .Names = c("Name", "Color", "Freq"
), row.names = c(NA, -4L), class = "data.frame")

df2 <-
structure(list(Name = structure(c(2L, 1L, 3L), .Label = c("apple", 
"banana", "plum"), class = "factor"), Color = structure(c(3L, 
2L, 1L), .Label = c("purple", "red", "yellow"), class = "factor"), 
    Freq = c(3, 2, 1)), .Names = c("Name", "Color", "Freq"), row.names = c(NA, 
-3L), class = "data.frame")

df3 <-
structure(list(Name = structure(c(2L, 1L, 1L), .Label = c("apple", 
"banana"), class = "factor"), Color = structure(c(3L, 2L, 1L), .Label = c("green", 
"red", "yellow"), class = "factor"), Freq = c(7, 9, 8)), .Names = c("Name", 
"Color", "Freq"), row.names = c(NA, -3L), class = "data.frame")

df4 <-
structure(list(Name = structure(c(1L, 1L, 2L), .Label = c("apple", 
"plum"), class = "factor"), Color = structure(c(3L, 1L, 2L), .Label = c("green", 
"purple", "red"), class = "factor"), Freq = c(1, 2, 6)), .Names = c("Name", 
"Color", "Freq"), row.names = c(NA, -3L), class = "data.frame")


推荐答案

使用似乎更容易循环为 Reduce reduce purrr )一次仅获取两个数据集,因此我们在合并中不能有两个以上的后缀

This seems to be easier with a for loop as the Reduce or reduce (purrr) at a time takes only two datasets, so we can't have more than two suffixes in the merge.

在这里,我们创建了一个后缀向量('sfx')。使用第一个 list 元素初始化输出数据集。然后循环遍历 list.df的序列,并与 res和 list.df的下一个元素进行顺序的 merge code>,同时在每个步骤中更新 res

Here, we created a vector of suffixes ('sfx'). Initialize an output dataset with the first list element. Then loop through the sequence of 'list.df' and do a sequential merge with the 'res' and the next element of list.df while updating the 'res' in each step

sfx <- c(".df1", ".df2", ".df3", ".df4")
res <- list.df[[1]]
for(i in head(seq_along(list.df), -1)) {

 res <- merge(res, list.df[[i+1]], all = TRUE, 
                 suffixes = sfx[i:(i+1)], by = c("Name", "Color"))
  }

res
#    Name  Color Freq.df1 Freq.df2 Freq.df3 Freq.df4
#1  apple  green        4       NA        8        2
#2  apple    red        1        2        9        1
#3 banana yellow        3        3        7       NA
#4   plum purple        8        1       NA        6

这篇关于R-通过合并和超过2个后缀来减少(或:如何合并多个数据框并跟踪列)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆