使用dplyr / tidyr将与分类变量关联的列扩展为多个列,同时保留id变量 [英] Expanding columns associated with a categorical variable into multiple columns with dplyr/tidyr while retaining id variable
问题描述
我有一个 data.frame
,看起来像这样:
I have a data.frame
that looks like this:
dfTall <- frame_data(
~id, ~x, ~y, ~z,
1, "a", 4, 5,
1, "b", 6, 5,
2, "a", 5, 4,
2, "b", 1, 9)
我想把它变成这样:
dfWide <- frame_data(
~id, ~y_a, ~y_b, ~z_a, ~z_b,
1, 4, 6, 5, 5,
2, 5, 1, 4, 9)
当前,我正在这样做
dfTall %>%
split(., .$x) %>%
mapply(function(df,name)
{df$x <- NULL; names(df) <- paste(names(df), name, sep='_'); df},
SIMPLIFY=FALSE, ., names(.)) %>%
bind_cols() %>%
select(-id_b) %>%
rename(id = id_a)
实际上,我将需要扩展大量的数字列(即,不仅是 y
和 z
)。我当前的解决方案有效,但是有问题,例如 id
变量的多个副本被添加到最终的 data.frame $中c $ c>并且需要将其删除。
In practice, I will have a larger number of numeric columns that need to be expanded (i.e., not just y
and z
). My current solution works, but it has issues, like the fact that multiple copies of the id
variable get added into the final data.frame
and need to be removed.
可以使用 tidyr
中的函数来完成此扩展吗传播
?
Can this expansion be done using a function from tidyr
such as spread
?
推荐答案
可以通过 spread
而不是一步,因为它涉及多个列作为值;您可以先收集
值列,统一
标题,然后传播
:
It can be done with spread
but not in a single step, as it involves multiple columns as values; You can firstly gather
the value columns, unite
the headers manually and then spread
:
library(dplyr)
library(tidyr)
dfTall %>%
gather(col, val, -id, -x) %>%
unite(key, col, x) %>%
spread(key, val)
# A tibble: 2 x 5
# id y_a y_b z_a z_b
#* <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 4 6 5 5
#2 2 5 1 4 9
如果使用 data.table
, dcast
支持转换多个值列:
If you use data.table
, dcast
supports cast multiple value columns:
library(data.table)
dcast(setDT(dfTall), id ~ x, value.var = c('y', 'z'))
# id y_a y_b z_a z_b
#1: 1 4 6 5 5
#2: 2 5 1 4 9
这篇关于使用dplyr / tidyr将与分类变量关联的列扩展为多个列,同时保留id变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!