Pivot_Longer 创建多个组合列 [英] Pivot_Longer to Create Multiple Combined Columns
问题描述
我在其他地方看到了一些关于我的问题的可能讨论,但它要么没有解决,要么我无法完全理解答案是否适用,所以我正在创建一个新问题.
I have seen some possible discussion of my problem elsewhere but it either wasn't resolved or I could not fully understand if the answer applied, so I'm creating a new question.
以下问题特别涉及此主题,但尚未解决.使用 pivot_longer 将宽列收集成多个长列
The following question in particular touches on this subject but is not resolved. Gathering wide columns into multiple long columns using pivot_longer
获取以下示例数据.如您所见,有一个唯一标识符变量,然后是 8 个其他变量.在其他 8 个中,您可以将它们分为两组,gpa 和 percent_a.每个集合都有一个班级、小组、课程和部门值.
Take the following sample data. As you can see there is a unique identifier variable, and then 8 other variables. Of the other 8, you can group them into two sets, gpa and percent_a. For each set there is a class, group, course, and dept value.
在我的实际数据中,我有大约 20 个不同的集合,所有集合都具有相同的结构,每个集合中有相同的四个描述符.
In my actual data I have about 20 different sets, all with the same structure, the same four descriptors in each set.
我想做的是执行一个类似于pivot_longer的功能.除了不是将多个列组合成一组键和值列之外,我的数据中的每个唯一集(班级、组、课程、部门)都将被分组到各自的键/值列中.
What I would like to do is perform a function similar to pivot_longer. Except instead of combining multiple columns into a set of key and value columns, each unique set in my data (class, group, course, dept) would be grouped into there own key/value columns.
set.seed(101)
df <- data.frame(
id = 1:10,
class_gpa = rnorm(10, 0, 1),
course_gpa = rnorm(10, 0, 1),
group_gpa = rnorm(10, 0, 1),
dept_gpa = rnorm(10, 0, 1),
class_percent_a = rnorm(10, 0, 1),
course_percent_a = rnorm(10, 0, 1),
group_percent_a = rnorm(10, 0, 1),
dept_percent_a = rnorm(10, 0, 1)
)
因此,在此示例中,假设我将所有 gpa 值分为两列(gpa_type 和 gpa_value),将 percent_a 值分为两列(percent_a_type、percent_a_value),那么最后我只会得到 5列:
So in this example, lets say I group all of the gpa values into two columns (gpa_type, and gpa_value) and the percent_a values into two columns (percent_a_type, percent_a_value), then I would end up at the end with only 5 columns:
id, gpa_type, gpa_value, percent_a_type, percent_a_value
有没有办法做到这一点?使用 pivot_longer 或其他方法.谢谢.
Is there a way to do this? Either with pivot_longer or another method. Thanks.
推荐答案
老实说,我宁愿这样做:
Honestly, I would rather simply do:
df %>% pivot_longer(-id, names_to = c("type", ".value"), names_pattern = "([^_]+)_(.*)")
并将数据保存为更实用的格式:
And keep the data into a more practical format:
# A tibble: 40 x 4
id type gpa percent_a
<int> <chr> <dbl> <dbl>
1 1 class -0.326 0.482
2 1 course 0.526 -1.15
3 1 group -0.164 -0.260
4 1 dept 0.895 1.51
5 2 class 0.552 0.758
6 2 course -0.795 -0.274
7 2 group 0.709 -1.41
8 2 dept 0.279 1.62
9 3 class -0.675 -2.32
10 3 course 1.43 0.578
# … with 30 more rows
<小时>
为什么要为每个集合"复制类型"属性?
Why duplicate the "type" attribute for each "set"?
对于您想要的输出:
# A tibble: 40 x 5
id gpa_type gpa_value percent_a_type percent_a_value
<int> <chr> <dbl> <chr> <dbl>
1 1 class -0.326 class 0.482
2 1 course 0.526 course -1.15
3 1 group -0.164 group -0.260
4 1 dept 0.895 dept 1.51
5 2 class 0.552 class 0.758
6 2 course -0.795 course -0.274
7 2 group 0.709 group -1.41
8 2 dept 0.279 dept 1.62
9 3 class -0.675 class -2.32
10 3 course 1.43 course 0.578
# … with 30 more rows
你可以试试:
lst_df <- df %>%
gather(key, value, -id) %>%
extract(key, into = c("var", "type"), "([^_]+)_(.*)") %>%
split(.$type)
names(lst_df) %>%
map_dfc(~ setNames(
lst_df[[.x]] %>%
select(-type),
c("id", paste0(.x, c("_type", "_value"))))) %>%
select(-matches("id\\d+"))
这篇关于Pivot_Longer 创建多个组合列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!