根据另一个数据框在数据框中展开行并添加列 [英] Expand Rows and Add Columns in Data Frame Based On Another Data Frame
问题描述
team.df
中的每一行都由一个 NBA球队组成.根据所有明星球员的数量中的每个数据帧包含多行 a>与每个NBA球队相关.
Each row in team.df
consists of one NBA team. Each data frame in list.of.all.stars
contains multiple rows based on the number of all star players associated with each NBA team.
使用 apply()
函数集,我扩展team.df
中的行,以每个团队的所有明星球员数量增加,然后将list.of.all.stars
中的列合并到最终输出?
Using the apply()
family of functions, how can I expand the rows in team.df
to grow by the number of all star players by each team and combine the columns from the list.of.all.stars
to the final output?
我也完全接受非apply()
方法,只是想举一个我希望避免编写循环的例子.
I'm totally open to non-apply()
methods as well, just wanted to give an example that I'm hoping to avoid writing for loops.
下面是我想要的输出:
# Team_Name Team_Location Player Captain
# 1 Cavaliers Cleveland, OH LeBron James TRUE
# 2 Cavaliers Cleveland, OH Kevin Love FALSE
# 3 Warriors Oakland, CA Stephen Curry TRUE
# 4 Warriors Oakland, CA Kevin Durant FALSE
# 5 Warriors Oakland, CA Klay Thompson FALSE
# 6 Warriors Oakland, CA Draymond Green FALSE
可复制示例
# create data frame
# about team information
team.df <-
data.frame(
Team_Name = c( "Cavaliers", "Warriors" )
, Team_Location = c( "Cleveland, OH", "Oakland, CA")
, stringsAsFactors = FALSE
)
# create list about
# all stars on each team
list.of.all.stars <-
list(
data.frame(
Player = c( "LeBron James", "Kevin Love" )
, Captain = c( TRUE, FALSE )
, stringsAsFactors = FALSE
)
, data.frame(
Player = c( "Stephen Curry", "Kevin Durant"
, "Klay Thompson", "Draymond Green"
)
, Captain = c( TRUE, FALSE, FALSE, FALSE )
, stringsAsFactors = FALSE
)
)
非apply()族方法
# cbind each data frame within the list.of.all.stars
# to its corresponding row in team.df
team.and.all.stars.list.of.df <-
list(
cbind(
df[ 1, ]
, list.of.all.stars[[1]]
)
, cbind(
df[ 2, ]
, list.of.all.stars[[2]]
)
)
# Warning messages:
# 1: In data.frame(..., check.names = FALSE) :
# row names were found from a short variable and have been discarded
# 2: In data.frame(..., check.names = FALSE) :
# row names were found from a short variable and have been discarded
# collapse each list
# into data frame
final.df <-
data.frame(
do.call(
what = "rbind"
, args = team.and.all.stars.list.of.df
)
, stringsAsFactors = FALSE
)
# view final output
final.df
# Team_Name Team_Location Player Captain
# 1 Cavaliers Cleveland, OH LeBron James TRUE
# 2 Cavaliers Cleveland, OH Kevin Love FALSE
# 3 Warriors Oakland, CA Stephen Curry TRUE
# 4 Warriors Oakland, CA Kevin Durant FALSE
# 5 Warriors Oakland, CA Klay Thompson FALSE
# 6 Warriors Oakland, CA Draymond Green FALSE
# end of script #
mapply()尝试失败
# Hoping to Apply A Function
# using a data frame and
# a list of data frames
mapply.method <-
mapply(
FUN = function( x, y )
cbind.data.frame(
x
, y
, stringsAsFactors = FALSE
)
, team.df
, list.of.all.stars
)
# view results
mapply.method
# Team_Name Team_Location
# x Character,2 Character,4
# Player Character,2 Character,4
# Captain Logical,2 Logical,4
# end of script #
推荐答案
关于OP在Map/mapply
'team.df'中使用'team.df'作为输入的方法是data.frame
,它是vector
列.它循环遍历vector
或列,而不遍历整个数据集或行(基于所需的输出).为防止这种情况,如果我们用list
包装,它是一个单元,可回收到'list.of.all.stars'
About the OP's approach of using 'team.df' as input in the Map/mapply
'team.df' is a data.frame
which is a list
of columns. So, the basic input is a column of vector
. It loops through the vector
or column instead of the whole dataset or the rows (based on the desired output). To prevent that, if we wrap with list
, it is a single unit, which recycles to each of the list
elements of the 'list.of.all.stars'
do.call(rbind, Map(cbind, list(team.df), list.of.all.stars))
基于预期的输出,"team.df"的每一行应具有"list.of.all.stars"的相应list
元素.在这种情况下,按行按split
'team.df'并执行cbind
Based on the expected output, each row of 'team.df' should have the corresponding list
element of 'list.of.all.stars'. In that case, split
the 'team.df' by the rows and do the cbind
res <- do.call(rbind, Map(cbind, split(team.df, seq_len(nrow(team.df))), list.of.all.stars))
row.names(res) <- NULL
res
# Team_Name Team_Location Player Captain
#1 Cavaliers Cleveland, OH LeBron James TRUE
#2 Cavaliers Cleveland, OH Kevin Love FALSE
#3 Warriors Oakland, CA Stephen Curry TRUE
#4 Warriors Oakland, CA Kevin Durant FALSE
#5 Warriors Oakland, CA Klay Thompson FALSE
#6 Warriors Oakland, CA Draymond Green FALSE
我们也可以在tidyverse
中执行此操作.按'team.df'中的所有列分组后,nest
它创建'data'的基本列表(长度为2),将'data'分配给'team.df'中的'list.of.all.stars'. mutate
和unnest
list
We can also do this in tidyverse
. After grouping by all the columns in 'team.df', nest
it to create a base list of 'data' (which will be of length 2), assign 'data' to 'list.of.all.stars' in mutate
and unnest
the list
library(tidyverse)
team.df %>%
group_by_all() %>%
nest %>%
mutate(data = list.of.all.stars) %>%
unnest
# A tibble: 6 x 4
# Team_Name Team_Location Player Captain
# <chr> <chr> <chr> <lgl>
# 1 Cavaliers Cleveland, OH LeBron James T
# 2 Cavaliers Cleveland, OH Kevin Love F
# 3 Warriors Oakland, CA Stephen Curry T
# 4 Warriors Oakland, CA Kevin Durant F
# 5 Warriors Oakland, CA Klay Thompson F
# 6 Warriors Oakland, CA Draymond Green F
这篇关于根据另一个数据框在数据框中展开行并添加列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!