基于另一个数据框在数据框中展开行并添加列 [英] Expand Rows and Add Columns in Data Frame Based On Another Data Frame
问题描述
team.df
中的每一行包含一个 NBA 球队.list.of.all.stars
中的每个数据框都包含基于 与每个 NBA 球队相关的所有明星球员.
Each row in team.df
consists of one NBA team. Each data frame in list.of.all.stars
contains multiple rows based on the number of all star players associated with each NBA team.
使用apply()
函数系列,我如何扩展 team.df
中的行以增加每个团队的所有明星球员的数量和结合 中的列list.of.all.stars
到最终输出?
Using the apply()
family of functions, how can I expand the rows in team.df
to grow by the number of all star players by each team and combine the columns from the list.of.all.stars
to the final output?
我对非 apply()
方法也完全开放,只是想举一个例子,我希望避免编写 for 循环.
I'm totally open to non-apply()
methods as well, just wanted to give an example that I'm hoping to avoid writing for loops.
以下是我想要的输出:
# Team_Name Team_Location Player Captain
# 1 Cavaliers Cleveland, OH LeBron James TRUE
# 2 Cavaliers Cleveland, OH Kevin Love FALSE
# 3 Warriors Oakland, CA Stephen Curry TRUE
# 4 Warriors Oakland, CA Kevin Durant FALSE
# 5 Warriors Oakland, CA Klay Thompson FALSE
# 6 Warriors Oakland, CA Draymond Green FALSE
可重现的示例
# create data frame
# about team information
team.df <-
data.frame(
Team_Name = c( "Cavaliers", "Warriors" )
, Team_Location = c( "Cleveland, OH", "Oakland, CA")
, stringsAsFactors = FALSE
)
# create list about
# all stars on each team
list.of.all.stars <-
list(
data.frame(
Player = c( "LeBron James", "Kevin Love" )
, Captain = c( TRUE, FALSE )
, stringsAsFactors = FALSE
)
, data.frame(
Player = c( "Stephen Curry", "Kevin Durant"
, "Klay Thompson", "Draymond Green"
)
, Captain = c( TRUE, FALSE, FALSE, FALSE )
, stringsAsFactors = FALSE
)
)
非 apply() 家族方法
# cbind each data frame within the list.of.all.stars
# to its corresponding row in team.df
team.and.all.stars.list.of.df <-
list(
cbind(
df[ 1, ]
, list.of.all.stars[[1]]
)
, cbind(
df[ 2, ]
, list.of.all.stars[[2]]
)
)
# Warning messages:
# 1: In data.frame(..., check.names = FALSE) :
# row names were found from a short variable and have been discarded
# 2: In data.frame(..., check.names = FALSE) :
# row names were found from a short variable and have been discarded
# collapse each list
# into data frame
final.df <-
data.frame(
do.call(
what = "rbind"
, args = team.and.all.stars.list.of.df
)
, stringsAsFactors = FALSE
)
# view final output
final.df
# Team_Name Team_Location Player Captain
# 1 Cavaliers Cleveland, OH LeBron James TRUE
# 2 Cavaliers Cleveland, OH Kevin Love FALSE
# 3 Warriors Oakland, CA Stephen Curry TRUE
# 4 Warriors Oakland, CA Kevin Durant FALSE
# 5 Warriors Oakland, CA Klay Thompson FALSE
# 6 Warriors Oakland, CA Draymond Green FALSE
# end of script #
mapply() 尝试失败
# Hoping to Apply A Function
# using a data frame and
# a list of data frames
mapply.method <-
mapply(
FUN = function( x, y )
cbind.data.frame(
x
, y
, stringsAsFactors = FALSE
)
, team.df
, list.of.all.stars
)
# view results
mapply.method
# Team_Name Team_Location
# x Character,2 Character,4
# Player Character,2 Character,4
# Captain Logical,2 Logical,4
# end of script #
推荐答案
关于 OP 在 Map/mapply
中使用team.df"作为输入的方法,team.df"是一个 data.frame
这是列的 list
.所以,基本输入是一列vector
.它遍历 vector
或列而不是整个数据集或行(基于所需的输出).为了防止这种情况,如果我们用 list
包裹,它是一个单独的单元,它循环到 'list.of.all.stars'<的每个 list
元素/p>
About the OP's approach of using 'team.df' as input in the Map/mapply
'team.df' is a data.frame
which is a list
of columns. So, the basic input is a column of vector
. It loops through the vector
or column instead of the whole dataset or the rows (based on the desired output). To prevent that, if we wrap with list
, it is a single unit, which recycles to each of the list
elements of the 'list.of.all.stars'
do.call(rbind, Map(cbind, list(team.df), list.of.all.stars))
<小时>
根据预期的输出,'team.df'的每一行都应该有'list.of.all.stars'对应的list
元素.在这种情况下,split
'team.df' 按行并执行 cbind
Based on the expected output, each row of 'team.df' should have the corresponding list
element of 'list.of.all.stars'. In that case, split
the 'team.df' by the rows and do the cbind
res <- do.call(rbind, Map(cbind, split(team.df, seq_len(nrow(team.df))), list.of.all.stars))
row.names(res) <- NULL
res
# Team_Name Team_Location Player Captain
#1 Cavaliers Cleveland, OH LeBron James TRUE
#2 Cavaliers Cleveland, OH Kevin Love FALSE
#3 Warriors Oakland, CA Stephen Curry TRUE
#4 Warriors Oakland, CA Kevin Durant FALSE
#5 Warriors Oakland, CA Klay Thompson FALSE
#6 Warriors Oakland, CA Draymond Green FALSE
<小时>
我们也可以在 tidyverse
中做到这一点.按'team.df'中的所有列分组后,nest
它创建一个'data'的基本列表(长度为2),将'data'分配给'list.of.mutate
和 unnest
中的 all.stars' list
We can also do this in tidyverse
. After grouping by all the columns in 'team.df', nest
it to create a base list of 'data' (which will be of length 2), assign 'data' to 'list.of.all.stars' in mutate
and unnest
the list
library(tidyverse)
team.df %>%
group_by_all() %>%
nest %>%
mutate(data = list.of.all.stars) %>%
unnest
# A tibble: 6 x 4
# Team_Name Team_Location Player Captain
# <chr> <chr> <chr> <lgl>
# 1 Cavaliers Cleveland, OH LeBron James T
# 2 Cavaliers Cleveland, OH Kevin Love F
# 3 Warriors Oakland, CA Stephen Curry T
# 4 Warriors Oakland, CA Kevin Durant F
# 5 Warriors Oakland, CA Klay Thompson F
# 6 Warriors Oakland, CA Draymond Green F
这篇关于基于另一个数据框在数据框中展开行并添加列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!