使用dplyr / tidyr将与分类变量关联的列扩展为多个列,同时保留id变量 [英] Expanding columns associated with a categorical variable into multiple columns with dplyr/tidyr while retaining id variable

查看:99
本文介绍了使用dplyr / tidyr将与分类变量关联的列扩展为多个列,同时保留id变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 data.frame ,看起来像这样:

I have a data.frame that looks like this:

dfTall <- frame_data(
    ~id, ~x, ~y, ~z,
      1, "a", 4, 5,
      1, "b", 6, 5,
      2, "a", 5, 4,
      2, "b", 1, 9)

我想把它变成这样:

dfWide <- frame_data(
    ~id, ~y_a, ~y_b, ~z_a, ~z_b,
      1,    4,    6,    5,    5,
      2,    5,    1,    4,    9)

当前,我正在这样做

dfTall %>%
    split(., .$x) %>%
    mapply(function(df,name) 
        {df$x <- NULL; names(df) <- paste(names(df), name, sep='_'); df}, 
        SIMPLIFY=FALSE, ., names(.)) %>%
    bind_cols() %>%
    select(-id_b) %>%
    rename(id = id_a)

实际上,我将需要扩展大量的数字列(即,不仅是 y z )。我当前的解决方案有效,但是有问题,例如 id 变量的多个副本被添加到最终的 data.frame 并且需要将其删除。

In practice, I will have a larger number of numeric columns that need to be expanded (i.e., not just y and z). My current solution works, but it has issues, like the fact that multiple copies of the id variable get added into the final data.frame and need to be removed.

可以使用 tidyr 中的函数来完成此扩展吗传播

Can this expansion be done using a function from tidyr such as spread?

推荐答案

可以通过 spread 而不是一步,因为它涉及多个列作为值;您可以先收集值列,统一标题,然后传播

It can be done with spread but not in a single step, as it involves multiple columns as values; You can firstly gather the value columns, unite the headers manually and then spread:

library(dplyr)
library(tidyr)

dfTall %>% 
    gather(col, val, -id, -x) %>% 
    unite(key, col, x) %>% 
    spread(key, val)

# A tibble: 2 x 5
#     id   y_a   y_b   z_a   z_b
#* <dbl> <dbl> <dbl> <dbl> <dbl>
#1     1     4     6     5     5
#2     2     5     1     4     9






如果使用 data.table dcast 支持转换多个值列:


If you use data.table, dcast supports cast multiple value columns:

library(data.table)
dcast(setDT(dfTall), id ~ x, value.var = c('y', 'z'))

#   id y_a y_b z_a z_b
#1:  1   4   6   5   5
#2:  2   5   1   4   9 

这篇关于使用dplyr / tidyr将与分类变量关联的列扩展为多个列,同时保留id变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆