R:将多列从长整型重塑 [英] R: Reshaping Multiple Columns from Long to Wide

查看:94
本文介绍了R:将多列从长整型重塑的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用以下数据:

library(tidyverse)

sample_df <- data.frame(Letter = c("a", "a", "a", "b", "b"),
                        Number = c(1,2,1,3,4),
                        Fruit = c("Apple", "Plum", "Peach", "Pear", "Peach"))


Letter Number Fruit
a      1      Apple
a      2      Plum
a      1      Peach
b      3      Pear
b      4      Peach

我要将一组值从长格式转换为宽格式:

I want to transform a set of values from a long to a wide format:

Letter Number_1 Number_2 Fruit_1 Fruit_2 Fruit_3
a      1        2        Apple   Plum    Peach
b      3        4        Pear    Peach   

为此,我尝试创建一个使用 c( Letter, Number) c( Letter, Fruit)。首先,是否需要创建该索引,如果需要,应该如何创建?

To do so, I unsuccessfully tried to create an index of each unique group combinations using c("Letter", "Number") and c("Letter", "Fruit"). Firstly, does this index need to be created, and if so how should it be done?

# Gets Unique Values, but no Index of Unique Combinations
sample_df1 <- sample_df %>%
  group_by(Letter) %>%  
  mutate(Id1 = n_distinct(Letter, Number),
         Id2 = n_distinct(Letter, Fruit))

# Gets Following Error: Column `Id1` must be length 3 (the group size) or one, not 2
sample_df1 <- sample_df %>%
  group_by(Letter) %>%  
  mutate(Id1 = 1:n_distinct(Letter, Number),
         Id2 = 1:n_distinct(Letter, Fruit))

# NOTE: Manually Created the Index Columns to show next problem
sample_df1 <- sample_df %>%
  group_by(Letter) %>%  
  add_column(Id1 = c(1,2,1,1,2),
         Id2 = c(1,2,3,1,2)) 

假设确实需要完成,我手动附加所需的值,并使用开发性 tidyr 部分解决了该问题。

Assuming it did need to be done, I manually appended the desired values, and partially solved the problem using developmental tidyr.

# Requires Developmental Tidyr
devtools::install_github("tidyverse/tidyr")  

sample_df1 %>%  
  pivot_wider(names_from = c("Id1", "Id2"), values_from = c("Number", "Fruit")) %>%
  set_names(~ str_replace_all(.,"(\\w+.*)(_\\d)(_\\d)", "\\1\\3"))

#  Letter Number_1 Number_2 Number_3 Fruit_1 Fruit_2 Fruit_3
#<fct>     <dbl>    <dbl>    <dbl> <fct>   <fct>   <fct>  
# a         1        2        1     Apple   Plum    Peach  
# b         3        4       NA     Pear    Peach   NA     

但是,这种方法仍然创建了不必要的 Number_3 列。使用任何 tidyr data.table 或任何其他软件包,有没有办法以所需的格式获取数据没有重复的列?

However, this approach still created an unwanted Number_3 column. Using any tidyr, data.table or any other package, is there any way of getting the data in the desired format without duplicating columns?

推荐答案

一种选择是将重复的元素替换为字母 '到 NA ,然后在调整后的数据中,删除全部为 NA

An option would be to replace the duplicated elements by 'Letter' to NA and then in the reshaped data, remove the columns that are all NA

library(data.table)
out <- dcast(setDT(sample_df)[, lapply(.SD, function(x) 
     replace(x, duplicated(x), NA)), Letter], Letter ~ rowid(Letter), 
     value.var = c("Number", "Fruit"))
nm1 <- out[, names(which(!colSums(!is.na(.SD))))]
out[, (nm1) := NULL][]
#   Letter Number_1 Number_2 Fruit_1 Fruit_2 Fruit_3
#1:      a        1        2   Apple    Plum   Peach
#2:      b        3        4    Pear   Peach    <NA>






如果我们要使用 tidyverse 方法,可以使用类似的选项。请注意, pivot_wider 来自 tidyr tidyr_0.8.3.9000


If we want to use the tidyverse approach, a similar option can be used. Note that pivot_wider is from the dev version of tidyr (tidyr_0.8.3.9000)

library(tidyverse)
sample_df %>% 
     group_by(Letter) %>%
     mutate_at(vars(-group_cols()), ~ replace(., duplicated(.), NA)) %>%
     mutate(rn = row_number()) %>% 
  pivot_wider(
          names_from = rn,
          values_from = c("Number", "Fruit")) %>%
  select_if(~ any(!is.na(.)))
# A tibble: 2 x 6
#  Letter Number_1 Number_2 Fruit_1 Fruit_2 Fruit_3
#  <fct>     <dbl>    <dbl> <fct>   <fct>   <fct>  
#1 a             1        2 Apple   Plum    Peach  
#2 b             3        4 Pear    Peach   <NA>   

这篇关于R:将多列从长整型重塑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆