R:将多列从长整型重塑 [英] R: Reshaping Multiple Columns from Long to Wide
问题描述
使用以下数据:
library(tidyverse)
sample_df <- data.frame(Letter = c("a", "a", "a", "b", "b"),
Number = c(1,2,1,3,4),
Fruit = c("Apple", "Plum", "Peach", "Pear", "Peach"))
Letter Number Fruit
a 1 Apple
a 2 Plum
a 1 Peach
b 3 Pear
b 4 Peach
我要将一组值从长格式转换为宽格式:
I want to transform a set of values from a long to a wide format:
Letter Number_1 Number_2 Fruit_1 Fruit_2 Fruit_3
a 1 2 Apple Plum Peach
b 3 4 Pear Peach
为此,我尝试创建一个使用 c( Letter, Number)
和 c( Letter, Fruit)$的每个唯一组组合的索引c $ c>。首先,是否需要创建该索引,如果需要,应该如何创建?
To do so, I unsuccessfully tried to create an index of each unique group combinations using c("Letter", "Number")
and c("Letter", "Fruit")
. Firstly, does this index need to be created, and if so how should it be done?
# Gets Unique Values, but no Index of Unique Combinations
sample_df1 <- sample_df %>%
group_by(Letter) %>%
mutate(Id1 = n_distinct(Letter, Number),
Id2 = n_distinct(Letter, Fruit))
# Gets Following Error: Column `Id1` must be length 3 (the group size) or one, not 2
sample_df1 <- sample_df %>%
group_by(Letter) %>%
mutate(Id1 = 1:n_distinct(Letter, Number),
Id2 = 1:n_distinct(Letter, Fruit))
# NOTE: Manually Created the Index Columns to show next problem
sample_df1 <- sample_df %>%
group_by(Letter) %>%
add_column(Id1 = c(1,2,1,1,2),
Id2 = c(1,2,3,1,2))
假设确实需要完成,我手动附加所需的值,并使用开发性 tidyr 部分解决了该问题。
Assuming it did need to be done, I manually appended the desired values, and partially solved the problem using developmental tidyr.
# Requires Developmental Tidyr
devtools::install_github("tidyverse/tidyr")
sample_df1 %>%
pivot_wider(names_from = c("Id1", "Id2"), values_from = c("Number", "Fruit")) %>%
set_names(~ str_replace_all(.,"(\\w+.*)(_\\d)(_\\d)", "\\1\\3"))
# Letter Number_1 Number_2 Number_3 Fruit_1 Fruit_2 Fruit_3
#<fct> <dbl> <dbl> <dbl> <fct> <fct> <fct>
# a 1 2 1 Apple Plum Peach
# b 3 4 NA Pear Peach NA
但是,这种方法仍然创建了不必要的 Number_3
列。使用任何 tidyr
, data.table
或任何其他软件包,有没有办法以所需的格式获取数据没有重复的列?
However, this approach still created an unwanted Number_3
column. Using any tidyr
, data.table
or any other package, is there any way of getting the data in the desired format without duplicating columns?
推荐答案
一种选择是将重复的
元素替换为字母 '到 NA
,然后在调整后的数据中,删除全部为 NA
An option would be to replace the duplicated
elements by 'Letter' to NA
and then in the reshaped data, remove the columns that are all NA
library(data.table)
out <- dcast(setDT(sample_df)[, lapply(.SD, function(x)
replace(x, duplicated(x), NA)), Letter], Letter ~ rowid(Letter),
value.var = c("Number", "Fruit"))
nm1 <- out[, names(which(!colSums(!is.na(.SD))))]
out[, (nm1) := NULL][]
# Letter Number_1 Number_2 Fruit_1 Fruit_2 Fruit_3
#1: a 1 2 Apple Plum Peach
#2: b 3 4 Pear Peach <NA>
如果我们要使用 tidyverse
方法,可以使用类似的选项。请注意, pivot_wider
来自 tidyr
( tidyr_0.8.3.9000
)
If we want to use the tidyverse
approach, a similar option can be used. Note that pivot_wider
is from the dev version of tidyr
(tidyr_0.8.3.9000
)
library(tidyverse)
sample_df %>%
group_by(Letter) %>%
mutate_at(vars(-group_cols()), ~ replace(., duplicated(.), NA)) %>%
mutate(rn = row_number()) %>%
pivot_wider(
names_from = rn,
values_from = c("Number", "Fruit")) %>%
select_if(~ any(!is.na(.)))
# A tibble: 2 x 6
# Letter Number_1 Number_2 Fruit_1 Fruit_2 Fruit_3
# <fct> <dbl> <dbl> <fct> <fct> <fct>
#1 a 1 2 Apple Plum Peach
#2 b 3 4 Pear Peach <NA>
这篇关于R:将多列从长整型重塑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!