根据另一个数据框在数据框中展开行并添加列 [英] Expand Rows and Add Columns in Data Frame Based On Another Data Frame

查看:73
本文介绍了根据另一个数据框在数据框中展开行并添加列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

team.df中的每一行都由一个 NBA球队组成.根据所有明星球员的数量中的每个数据帧包含多行 a>与每个NBA球队相关.

Each row in team.df consists of one NBA team. Each data frame in list.of.all.stars contains multiple rows based on the number of all star players associated with each NBA team.

使用 apply() 函数集,我扩展team.df中的行,以每个团队的所有明星球员数量增加,然后将list.of.all.stars中的列合并到最终输出?

Using the apply() family of functions, how can I expand the rows in team.df to grow by the number of all star players by each team and combine the columns from the list.of.all.stars to the final output?

我也完全接受非apply()方法,只是想举一个我希望避免编写循环的例子.

I'm totally open to non-apply() methods as well, just wanted to give an example that I'm hoping to avoid writing for loops.

下面是我想要的输出:

#   Team_Name Team_Location         Player Captain
# 1 Cavaliers Cleveland, OH   LeBron James    TRUE
# 2 Cavaliers Cleveland, OH     Kevin Love   FALSE
# 3  Warriors   Oakland, CA  Stephen Curry    TRUE
# 4  Warriors   Oakland, CA   Kevin Durant   FALSE
# 5  Warriors   Oakland, CA  Klay Thompson   FALSE
# 6  Warriors   Oakland, CA Draymond Green   FALSE

可复制示例

# create data frame 
# about team information
team.df <-
  data.frame(
    Team_Name       = c( "Cavaliers", "Warriors" )
    , Team_Location = c( "Cleveland, OH", "Oakland, CA")
    , stringsAsFactors = FALSE
  )

# create list about
# all stars on each team
list.of.all.stars <-
  list( 
    data.frame(
      Player = c( "LeBron James", "Kevin Love" )
      , Captain = c( TRUE, FALSE )
      , stringsAsFactors = FALSE
    )
    , data.frame( 
      Player = c( "Stephen Curry", "Kevin Durant"
                  , "Klay Thompson", "Draymond Green"
      )
      , Captain = c( TRUE, FALSE, FALSE, FALSE )
      , stringsAsFactors = FALSE
    )
  )

非apply()族方法

# cbind each data frame within the list.of.all.stars
# to its corresponding row in team.df
team.and.all.stars.list.of.df <-
  list(
    cbind(
      df[ 1, ]
      , list.of.all.stars[[1]]
    )
    ,   cbind(
      df[ 2, ]
      , list.of.all.stars[[2]]
    )
  )
# Warning messages:
#   1: In data.frame(..., check.names = FALSE) :
#   row names were found from a short variable and have been discarded
# 2: In data.frame(..., check.names = FALSE) :
#   row names were found from a short variable and have been discarded

# collapse each list
# into data frame
final.df <-
  data.frame(
    do.call(
      what = "rbind"
      , args = team.and.all.stars.list.of.df
    )
    , stringsAsFactors = FALSE
  )
# view final output
final.df
# Team_Name Team_Location         Player Captain
# 1 Cavaliers Cleveland, OH   LeBron James    TRUE
# 2 Cavaliers Cleveland, OH     Kevin Love   FALSE
# 3  Warriors   Oakland, CA  Stephen Curry    TRUE
# 4  Warriors   Oakland, CA   Kevin Durant   FALSE
# 5  Warriors   Oakland, CA  Klay Thompson   FALSE
# 6  Warriors   Oakland, CA Draymond Green   FALSE

# end of script #

mapply()尝试失败

# Hoping to Apply A Function
# using a data frame and
# a list of data frames
mapply.method <-
  mapply(
    FUN = function( x, y )
      cbind.data.frame(
        x
        , y
        , stringsAsFactors = FALSE
      )
    , team.df
    , list.of.all.stars
  )

# view results
mapply.method
#         Team_Name   Team_Location
# x       Character,2 Character,4  
# Player  Character,2 Character,4  
# Captain Logical,2   Logical,4 

# end of script #

推荐答案

关于OP在Map/mapply'team.df'中使用'team.df'作为输入的方法是data.frame,它是列.因此,基本输入是vector列.它循环遍历vector或列,而不遍历整个数据集或行(基于所需的输出).为防止这种情况,如果我们用list包装,它是一个单元,可回收到'list.of.all.stars'

About the OP's approach of using 'team.df' as input in the Map/mapply 'team.df' is a data.frame which is a list of columns. So, the basic input is a column of vector. It loops through the vector or column instead of the whole dataset or the rows (based on the desired output). To prevent that, if we wrap with list, it is a single unit, which recycles to each of the list elements of the 'list.of.all.stars'

do.call(rbind, Map(cbind, list(team.df), list.of.all.stars))


基于预期的输出,"team.df"的每一行应具有"list.of.all.stars"的相应list元素.在这种情况下,按行按split'team.df'并执行cbind


Based on the expected output, each row of 'team.df' should have the corresponding list element of 'list.of.all.stars'. In that case, split the 'team.df' by the rows and do the cbind

res <- do.call(rbind, Map(cbind,  split(team.df, seq_len(nrow(team.df))), list.of.all.stars))
row.names(res) <- NULL
res
#   Team_Name Team_Location         Player Captain
#1 Cavaliers Cleveland, OH   LeBron James    TRUE
#2 Cavaliers Cleveland, OH     Kevin Love   FALSE
#3  Warriors   Oakland, CA  Stephen Curry    TRUE
#4  Warriors   Oakland, CA   Kevin Durant   FALSE
#5  Warriors   Oakland, CA  Klay Thompson   FALSE
#6  Warriors   Oakland, CA Draymond Green   FALSE


我们也可以在tidyverse中执行此操作.按'team.df'中的所有列分组后,nest它创建'data'的基本列表(长度为2),将'data'分配给'team.df'中的'list.of.all.stars'. mutateunnest list


We can also do this in tidyverse. After grouping by all the columns in 'team.df', nest it to create a base list of 'data' (which will be of length 2), assign 'data' to 'list.of.all.stars' in mutate and unnest the list

library(tidyverse)
team.df %>% 
      group_by_all() %>%
      nest %>% 
      mutate(data = list.of.all.stars) %>% 
      unnest
# A tibble: 6 x 4
#  Team_Name Team_Location Player         Captain
#  <chr>     <chr>         <chr>          <lgl>  
# 1 Cavaliers Cleveland, OH LeBron James   T      
# 2 Cavaliers Cleveland, OH Kevin Love     F      
# 3 Warriors  Oakland, CA   Stephen Curry  T      
# 4 Warriors  Oakland, CA   Kevin Durant   F      
# 5 Warriors  Oakland, CA   Klay Thompson  F      
# 6 Warriors  Oakland, CA   Draymond Green F      

这篇关于根据另一个数据框在数据框中展开行并添加列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆