在 R 中使用更长的枢轴堆叠多列 [英] Stacking multiple columns using pivot longer in R

查看:37
本文介绍了在 R 中使用更长的枢轴堆叠多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 pivot_longer 将 r 上的数据从宽更改为长.似乎有一些人在这里遇到了类似的问题,但我无法根据我的数据调整他们的解决方案.我在宽数据示例中附上了一张示例数据的图片,以及我在长数据示例中想要实现的目标.

I am trying to change my data from wide to long on r using pivot_longer. There appear to be a few people having similar issues on here but I have been unable to adapt their solutions to my data. I have attached a picture of example data in the wide data example and what I am trying to achieve in the long data example.

总而言之,我有一个引用所有列的时间和参考列,我还有多列组,主题,ID,xcoordinate ycoordinate 的形式:组 1,主题 1.ID1、xcoord1 ycoord1、group2、subject2、ID2、xcoord2、ycoord2 等等......我想要的是一个带列的长表:

In summary I have a time and reference column which refer to all columns, I also have multiple columns of group, subject, ID, xcoordinate ycoordinate in the form of: group1, subject1. ID1, xcoord1 ycoord1, group2, subject2, ID2, xcoord2, ycoord2 and so on... What I want is a long table with columns:

时间、参考、组、主题、ID、xcoord、ycoord.

time, reference, group, subject, ID, xcoord, ycoord.

5 列堆叠各自编号的列,时间和参考列重复相关堆栈.

With the 5 columns stacking their respective numbered columns, and the time and reference columns repeating for the relevant stacks.

 df %>%
 pivot_longer(cols = -c(time, reference),
names_to = c("group", "subject", "ID", "xcoord", "ycoord")

我的理解是我需要使用 names_pattern 函数,尽管我似乎无法让它工作,而且我找不到任何明确描述我应该如何使用它的内容.我应该说我的数据比示例数据宽得多,所以不能真正依赖于列的编号.

My understanding is that I require to use the names_pattern function, although I cant seem to get that to work, and I cant find anything clear describing How I should be using it. I should say my data is much wider than the example data, so can't really rely on numbering of columns.

感谢任何帮助

宽数据示例

    time reference group1 subject1 ID1 xcoord1 ycoord1 group2 subject2 ID2 xcoord2 ycoord2 group3 subject3 ID3 xcoord3 ycoord3
1  00:01   4097365      1        4   1    7.44   38.16      0       21   2   33.90   47.26      1       15   3   21.53    2.67
2  00:02   4097366      1        4   1    9.84   37.03      0       21   2   32.98   48.47      1       15   3   21.82    2.95
3  00:03   4097367      1        4   1   12.01   35.83      0       21   2   30.17   50.33      1       15   3   22.06    4.45
4  00:04   4097368      1        4   1   12.15   34.17      0       21   2   29.85   50.52      1       15   3   23.50    4.75
5  00:05   4097369      1        4   1   15.27   32.94      0       21   2   28.39   51.30      1       15   3   24.25    4.76
6  00:06   4097370      1        4   1   18.96   31.98      0       21   2   28.39   52.36      1       15   3   25.31    6.57
7  00:07   4097371      1        4   1   22.50   31.13      0       21   2   26.59   53.14      1       15   3   26.05    7.04
8  00:08   4097372      1        4   1   27.47   30.15      0       21   2   25.89   53.94      1       15   3   27.29    7.91
9  00:09   4097373      1        4   1   32.17   29.92      0       21   2   24.64   54.42      1       15   3   27.47    8.44
10 00:10   4097374      1        4   1   33.77   27.49      0       21   2   24.61   55.23      1       15   3   28.59    8.71

长数据示例

   time reference group subject ID xcoord ycoord
1  00:01   4097365     1       4  1   7.44  38.16
2  00:01   4097365     0      21  2  33.90  47.26
3  00:01   4097365     1      15  3  21.53   2.67
4  00:02   4097366     1       4  1   9.84  37.03
5  00:02   4097367     0      21  2  32.98  48.47
6  00:02   4097368     1      15  3  21.82   2.95
7  00:03   4097369     1       4  1  12.01  35.83
8  00:03   4097370     0      21  2  30.17  50.33
9  00:03   4097371     1      15  3  22.06   4.45
10 00:04   4097372     1       4  1  12.15  34.17

稍微处理一下数据我设法实现了这个奇怪的解决方案,它是长数据和宽数据的混合.

edit: playing about a bit with the data I have managed to achieve this odd solution which is a mixture of long and wide data.

dput(head(df1))

dput(head(df1))

structure(list(time = c(0, 0, 0, 0, 0, 0), state = structure(c(2L, 
2L, 2L, 2L, 2L, 2L), .Label = c("Alive", "Alive;:", "Dead", "Dead;:"
), class = "factor"), reference = c("1880439", "1880439", "1880439", 
"1880439", "1880439", "1880439"), num = c("1", NA, "2", "3", 
"4", "5"), group = c("1", NA, "1", "4", "0", "0"), X = c(NA, 
NA, NA, NA, NA, NA), ID = c(1L, NA, 2L, 4L, 5L, 6L), subect = c(21L, 
NA, 7L, -1L, 2L, 6L), x = c(3514L, NA, 2807L, 5550L, 3956L, 3686L
), y = c(-1644L, NA, -510L, 4400L, 1297L, -55L), speed = c("5.23", 
NA, "3.24", "0.00", "2.31", "3.57"), group1 = c("0", NA, "4", 
"1", "1", "0"), ID1 = c(13L, NA, 14L, 15L, 16L, 17L), subect1 = c(9L, 
NA, -1L, 13L, 14L, 11L), x1 = c(882L, NA, 5550L, 3004L, 761L, 
3317L), y1 = c(-1468L, NA, 4400L, 1633L, 559L, 1443L), speed1 = c("1.70", 
NA, "0.00", "3.06", "2.92", "3.30"), group2 = c("4", NA, "0", 
"1", "0", "0"), ID2 = c(24L, NA, 25L, 26L, 27L, 28L), subect2 = c(-1L, 
NA, 1L, 18L, 5L, 10L), x2 = c(5550L, NA, 5031L, 3936L, 3972L, 
3623L), y2 = c(4400L, NA, -74L, 190L, 686L, 356L), speed2 = c("0.00", 
NA, "0.54", "1.06", "0.95", "2.49"), speed.group2 = c(NA, NA, 
NA, NA, NA, NA)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-6L)).

代码生成的数据框如下所示

the dataframe the code results in looks like this

> head(df1)
# A tibble: 6 x 24
   time state reference num   group X        ID subect     x     y speed group1   ID1 subect1    x1    y1 speed1 group2   ID2 subect2    x2    y2 speed2
  <dbl> <fct> <chr>     <chr> <chr> <lgl> <int>  <int> <int> <int> <chr> <chr>  <int>   <int> <int> <int> <chr>  <chr>  <int>   <int> <int> <int> <chr> 
1     0 Aliv~ 1880439   1     1     NA        1     21  3514 -1644 5.23  0         13       9   882 -1468 1.70   4         24      -1  5550  4400 0.00  
2     0 Aliv~ 1880439   NA    NA    NA       NA     NA    NA    NA NA    NA        NA      NA    NA    NA NA     NA        NA      NA    NA    NA NA    
3     0 Aliv~ 1880439   2     1     NA        2      7  2807  -510 3.24  4         14      -1  5550  4400 0.00   0         25       1  5031   -74 0.54  
4     0 Aliv~ 1880439   3     4     NA        4     -1  5550  4400 0.00  1         15      13  3004  1633 3.06   1         26      18  3936   190 1.06  
5     0 Aliv~ 1880439   4     0     NA        5      2  3956  1297 2.31  1         16      14   761   559 2.92   0         27       5  3972   686 0.95  
6     0 Aliv~ 1880439   5     0     NA        6      6  3686   -55 3.57  0         17      11  3317  1443 3.30   0         28      10  3623   356 2.49  
# ... with 1 more variable: speed.group2 <lgl>

推荐答案

将首先重命名列并在数字前插入下划线,然后将其用作 pivot_longer 中的分隔符.

Would first rename columns and insert underscore right before number, then use that as separator in pivot_longer.

library(tidyverse)

df %>%
  rename_at(-c(1:2), ~ str_replace(., "(\\w+)(\\d)", "\\1_\\2")) %>% 
  pivot_longer(cols = -c(1:2), names_to = c(".value", "num"), names_sep = "_")

编辑 (2/7/20):

使用更新后的数据集,某些变量列名称的末尾似乎没有数字.我们可以为这些添加 0.

With your updated dataset, it appears that some of the variable column names don't have a number at the end. We can add 0 for those.

另外,我假设你想要:groupIDsubectxyspeed 重复(第 5 列中的第一个 group 与第 7-11 列中的相关变量分开).

Also, I assume you want: group, ID, subect, x, y, speed that are repeated (with the first group in column 5 separated from its related variables in columns 7-11).

df1 %>%
  rename_at(c(5,7:11), ~ paste0(., "0")) %>%
  rename_at(-c(1:4, 6, 24), ~ str_replace(., "(\\w+)(\\d+)", "\\1_\\2")) %>%
  pivot_longer(cols = -c(1:4, 6, 24), names_to = c(".value", "val"), names_sep = "_")

输出(修订):

# A tibble: 18 x 13
    time state   reference num   X     speed.group2 val   group    ID subect     x     y speed
   <dbl> <fct>   <chr>     <chr> <lgl> <lgl>        <chr> <chr> <int>  <int> <int> <int> <chr>
 1     0 Alive;: 1880439   1     NA    NA           0     1         1     21  3514 -1644 5.23 
 2     0 Alive;: 1880439   1     NA    NA           1     0        13      9   882 -1468 1.70 
 3     0 Alive;: 1880439   1     NA    NA           2     4        24     -1  5550  4400 0.00 
 4     0 Alive;: 1880439   NA    NA    NA           0     NA       NA     NA    NA    NA NA   
 5     0 Alive;: 1880439   NA    NA    NA           1     NA       NA     NA    NA    NA NA   
 6     0 Alive;: 1880439   NA    NA    NA           2     NA       NA     NA    NA    NA NA   
 7     0 Alive;: 1880439   2     NA    NA           0     1         2      7  2807  -510 3.24 
 8     0 Alive;: 1880439   2     NA    NA           1     4        14     -1  5550  4400 0.00 
 9     0 Alive;: 1880439   2     NA    NA           2     0        25      1  5031   -74 0.54 
10     0 Alive;: 1880439   3     NA    NA           0     4         4     -1  5550  4400 0.00 
11     0 Alive;: 1880439   3     NA    NA           1     1        15     13  3004  1633 3.06 
12     0 Alive;: 1880439   3     NA    NA           2     1        26     18  3936   190 1.06 
13     0 Alive;: 1880439   4     NA    NA           0     0         5      2  3956  1297 2.31 
14     0 Alive;: 1880439   4     NA    NA           1     1        16     14   761   559 2.92 
15     0 Alive;: 1880439   4     NA    NA           2     0        27      5  3972   686 0.95 
16     0 Alive;: 1880439   5     NA    NA           0     0         6      6  3686   -55 3.57 
17     0 Alive;: 1880439   5     NA    NA           1     0        17     11  3317  1443 3.30 
18     0 Alive;: 1880439   5     NA    NA           2     0        28     10  3623   356 2.49

这篇关于在 R 中使用更长的枢轴堆叠多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆