r不同长度和不同密钥的数据帧列表中的多个联接 [英] r multiple joins from list of data frames of differing lengths and differing keys

查看:99
本文介绍了r不同长度和不同密钥的数据帧列表中的多个联接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

比方说,我有以下数据帧列表:

Let's say I've got this list of data frames:

library(tidyverse)
df_list <- list(data.frame(cheese = c("ex","ok","bd"), 
                          cheese_val = c(3:1), 
                          stringsAsFactors = F),
               data.frame(egg = c("great","good","bad", "eww"), 
                          egg_val = c(4:1),
                          stringsAsFactors = F),
               data.frame(milk = c("good","bad"), 
                          milk_val = c(2:1), 
                          stringsAsFactors = F))

我有这个核心数据集:

core_dat <- data.frame(cheese = c("ex","ok","ok", "bd", "ok"), 
                      egg = c("great", "bad", "bad", "eww", "great"), 
                      milk = c("good", "good", "good", "bad", "good"), 
                      stringsAsFactors = F)

我想分别将core_datdf_list的每个元素连接在一起.

I'd like to get core_dat joined individually with each element of df_list.

然后我尝试了这个:

for(i in 1:length(df_list)) {
  gg<-core_dat %>% 
    left_join(df_list[[i]], by = names(df_list[[i]][1]), copy = T)
}

运行但仅将联接应用于milk列,这样core_dat中唯一的附加列是milk_val,但我希望也看到cheese_valegg_val.

which ran but only applied the join to the milk column such that the only additional column in core_dat was milk_val but I expected to see cheese_val, and egg_val too.

我怀疑这里比for循环还有更多合适的选项,我正在寻找建议.请注意,与这个小例子相比,我的实际数据集具有更多的df.

I suspect there are more appropriate options than a for loop here and I am looking for suggestions. Note that my actual data set has many more df's than this small example.

我不应该期望所得的数据帧(在本例中为gg)总共包含6列(3个标准名称+ 3个带有"val"后缀的列),看起来像这样: >

I should not that I expect the resulting data frame, in this case gg, to contain 6 columns total (3 standard name + 3 with "val" suffix) such that it looks like printed version of this:

data.frame(cheese = c("ex","ok","ok", "bd", "ok"), 
                      egg = c("great", "bad", "bad", "eww", "great"), 
                      milk = c("good", "good", "good", "bad", "good"), 
                      chees_val = c(3, 2, 2, 1, 2), 
                      egg_val = c(4, 2, 2, 1, 4), 
                      milk_val = c(2, 2, 2, 1, 2))

我在这里看到了许多多重联接"答案,但没有一个与我在这里要完成的工作完全一致(不同的键列,不同的数据长度).

I've seen many "multiple joins" answers here but none that quite line up with what I'm trying to accomplish here (differing key columns, differing lengths of data).

推荐答案

您可以使用map获取已连接数据帧的列表,然后使用reduce将它们全部连接在一起.

You can use map to get a list of joined data frames, then use reduce to join them all together.

map(df_list, right_join, rownames_to_column(core_dat)) %>%
  reduce(full_join)
# Joining, by = "cheese"
# Joining, by = "egg"
# Joining, by = "milk"
# Joining, by = c("cheese", "rowname", "egg", "milk")
# Joining, by = c("cheese", "rowname", "egg", "milk")
#   cheese cheese_val rowname   egg milk egg_val milk_val
# 1     ex          3       1 great good       4        2
# 2     ok          2       2   bad good       2        2
# 3     ok          2       3   bad good       2        2
# 4     bd          1       4   eww  bad       1        1
# 5     ok          2       5 great good       4        2

这篇关于r不同长度和不同密钥的数据帧列表中的多个联接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆