基于另一个数据框在数据框中展开行并添加列 [英] Expand Rows and Add Columns in Data Frame Based On Another Data Frame

查看:24
本文介绍了基于另一个数据框在数据框中展开行并添加列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

team.df 中的每一行包含一个 NBA 球队.list.of.all.stars 中的每个数据框都包含基于 与每个 NBA 球队相关的所有明星球员.

Each row in team.df consists of one NBA team. Each data frame in list.of.all.stars contains multiple rows based on the number of all star players associated with each NBA team.

使用apply() 函数系列,我如何扩展 team.df 中的行以增加每个团队的所有明星球员的数量结合 中的列list.of.all.stars 到最终输出?

Using the apply() family of functions, how can I expand the rows in team.df to grow by the number of all star players by each team and combine the columns from the list.of.all.stars to the final output?

我对非 apply() 方法也完全开放,只是想举一个例子,我希望避免编写 for 循环.

I'm totally open to non-apply() methods as well, just wanted to give an example that I'm hoping to avoid writing for loops.

以下是我想要的输出:

#   Team_Name Team_Location         Player Captain
# 1 Cavaliers Cleveland, OH   LeBron James    TRUE
# 2 Cavaliers Cleveland, OH     Kevin Love   FALSE
# 3  Warriors   Oakland, CA  Stephen Curry    TRUE
# 4  Warriors   Oakland, CA   Kevin Durant   FALSE
# 5  Warriors   Oakland, CA  Klay Thompson   FALSE
# 6  Warriors   Oakland, CA Draymond Green   FALSE

可重现的示例

# create data frame 
# about team information
team.df <-
  data.frame(
    Team_Name       = c( "Cavaliers", "Warriors" )
    , Team_Location = c( "Cleveland, OH", "Oakland, CA")
    , stringsAsFactors = FALSE
  )

# create list about
# all stars on each team
list.of.all.stars <-
  list( 
    data.frame(
      Player = c( "LeBron James", "Kevin Love" )
      , Captain = c( TRUE, FALSE )
      , stringsAsFactors = FALSE
    )
    , data.frame( 
      Player = c( "Stephen Curry", "Kevin Durant"
                  , "Klay Thompson", "Draymond Green"
      )
      , Captain = c( TRUE, FALSE, FALSE, FALSE )
      , stringsAsFactors = FALSE
    )
  )

非 apply() 家族方法

# cbind each data frame within the list.of.all.stars
# to its corresponding row in team.df
team.and.all.stars.list.of.df <-
  list(
    cbind(
      df[ 1, ]
      , list.of.all.stars[[1]]
    )
    ,   cbind(
      df[ 2, ]
      , list.of.all.stars[[2]]
    )
  )
# Warning messages:
#   1: In data.frame(..., check.names = FALSE) :
#   row names were found from a short variable and have been discarded
# 2: In data.frame(..., check.names = FALSE) :
#   row names were found from a short variable and have been discarded

# collapse each list
# into data frame
final.df <-
  data.frame(
    do.call(
      what = "rbind"
      , args = team.and.all.stars.list.of.df
    )
    , stringsAsFactors = FALSE
  )
# view final output
final.df
# Team_Name Team_Location         Player Captain
# 1 Cavaliers Cleveland, OH   LeBron James    TRUE
# 2 Cavaliers Cleveland, OH     Kevin Love   FALSE
# 3  Warriors   Oakland, CA  Stephen Curry    TRUE
# 4  Warriors   Oakland, CA   Kevin Durant   FALSE
# 5  Warriors   Oakland, CA  Klay Thompson   FALSE
# 6  Warriors   Oakland, CA Draymond Green   FALSE

# end of script #

mapply() 尝试失败

# Hoping to Apply A Function
# using a data frame and
# a list of data frames
mapply.method <-
  mapply(
    FUN = function( x, y )
      cbind.data.frame(
        x
        , y
        , stringsAsFactors = FALSE
      )
    , team.df
    , list.of.all.stars
  )

# view results
mapply.method
#         Team_Name   Team_Location
# x       Character,2 Character,4  
# Player  Character,2 Character,4  
# Captain Logical,2   Logical,4 

# end of script #

推荐答案

关于 OP 在 Map/mapply 中使用team.df"作为输入的方法,team.df"是一个 data.frame 这是列的 list.所以,基本输入是一列vector.它遍历 vector 或列而不是整个数据集或行(基于所需的输出).为了防止这种情况,如果我们用 list 包裹,它是一个单独的单元,它循环到 'list.of.all.stars'<的每个 list 元素/p>

About the OP's approach of using 'team.df' as input in the Map/mapply 'team.df' is a data.frame which is a list of columns. So, the basic input is a column of vector. It loops through the vector or column instead of the whole dataset or the rows (based on the desired output). To prevent that, if we wrap with list, it is a single unit, which recycles to each of the list elements of the 'list.of.all.stars'

do.call(rbind, Map(cbind, list(team.df), list.of.all.stars))

<小时>

根据预期的输出,'team.df'的每一行都应该有'list.of.all.stars'对应的list元素.在这种情况下,split 'team.df' 按行并执行 cbind


Based on the expected output, each row of 'team.df' should have the corresponding list element of 'list.of.all.stars'. In that case, split the 'team.df' by the rows and do the cbind

res <- do.call(rbind, Map(cbind,  split(team.df, seq_len(nrow(team.df))), list.of.all.stars))
row.names(res) <- NULL
res
#   Team_Name Team_Location         Player Captain
#1 Cavaliers Cleveland, OH   LeBron James    TRUE
#2 Cavaliers Cleveland, OH     Kevin Love   FALSE
#3  Warriors   Oakland, CA  Stephen Curry    TRUE
#4  Warriors   Oakland, CA   Kevin Durant   FALSE
#5  Warriors   Oakland, CA  Klay Thompson   FALSE
#6  Warriors   Oakland, CA Draymond Green   FALSE

<小时>

我们也可以在 tidyverse 中做到这一点.按'team.df'中的所有列分组后,nest它创建一个'data'的基本列表(长度为2),将'data'分配给'list.of.mutateunnest 中的 all.stars' list


We can also do this in tidyverse. After grouping by all the columns in 'team.df', nest it to create a base list of 'data' (which will be of length 2), assign 'data' to 'list.of.all.stars' in mutate and unnest the list

library(tidyverse)
team.df %>% 
      group_by_all() %>%
      nest %>% 
      mutate(data = list.of.all.stars) %>% 
      unnest
# A tibble: 6 x 4
#  Team_Name Team_Location Player         Captain
#  <chr>     <chr>         <chr>          <lgl>  
# 1 Cavaliers Cleveland, OH LeBron James   T      
# 2 Cavaliers Cleveland, OH Kevin Love     F      
# 3 Warriors  Oakland, CA   Stephen Curry  T      
# 4 Warriors  Oakland, CA   Kevin Durant   F      
# 5 Warriors  Oakland, CA   Klay Thompson  F      
# 6 Warriors  Oakland, CA   Draymond Green F      

这篇关于基于另一个数据框在数据框中展开行并添加列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆