宽到长:多列,两个时间点,两组 [英] Wide to long: multiple columns, two timepoints, two groups

查看:67
本文介绍了宽到长:多列,两个时间点,两组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经搜索并找到了许多示例,到目前为止,我仍然无法解决将数据从宽转换为长的问题。

I have searched and found a number of examples, so far I have not been able to solve a problem in transforming my data from wide to long.

下面是一个示例。数据:

Below is an example of the data:

set.seed(12345)
id = 1:100
age = sample(1:100, 100, replace=TRUE)
group = sample(1:2, 100, replace=TRUE)
t0_var1 = sample(1:300, 100, replace=TRUE)
t2_var1 = sample(1:300, 100, replace=TRUE)
t0_var2 = sample(1:600, 100, replace=TRUE)
t2_var2 = sample(1:600, 100, replace=TRUE)
t0_var3 = sample(1:700, 100, replace=TRUE)
t2_var3 = sample(1:700, 100, replace=TRUE)

dataset = data.frame(id, age, group, t0_var1, t2_var1, t0_var2, t2_var2, t0_var3, t2_var3)

head(dataset)
  id age group t0_var1 t2_var1 t0_var2 t2_var2 t0_var3 t2_var3
1  1  73     1     177      16      46     126     490     431
2  2  88     2     268     180     285     460       8     250
3  3  77     2      38     213     159     505     353     326
4  4  89     2     154     197     139     561     512     210
5  5  46     2     200     174     358     406     231      50
6  6  17     2     230     204      96     516     575     506

我需要按以下方式排列列:

I need the columns to be arranged as follows:

id | age | group | assessment | var1 | var2 | var3
1  | 23    1       1            12     34     64
1  | 23    1       2            34     14     68
2  | 43    2       1            22     44     54
2  | 43    2       2            34     54     88
...

我尝试了不同的方法,但我只在一次转换两列。

预先感谢您的帮助。

I tried in different ways, but I'm only managing to transform two columns at a time.
Thanks in advance for your help.

推荐答案

来自Ben's注释:

pivot_longer(dataset, cols = -c(id, age, group), names_to = c("assessment", ".value"), names_pattern = "t(\\d+)_(\\w+)")

# A tibble: 200 x 7
      id   age group assessment  var1  var2  var3
   <int> <int> <int> <chr>      <int> <int> <int>
 1     1    14     1 0            287     1   280
 2     1    14     1 2            266     5   523
 3     2    51     2 0            136   456   444
 4     2    51     2 2             66   292   260
 5     3    80     1 0            111   263   635
 6     3    80     1 2            275   255    70
 7     4    90     1 0            174   438   212
 8     4    90     1 2            161   161   694
 9     5    92     1 0             86   184   595
10     5    92     1 2            176   399    32

正则表达式 t(\\d +)_(\\w +)的意思是:


  • 寻找1或更多数字: \d +在字母 t之后。

  • 然后应该有一个下划线 _

  • 然后寻找1个或多个文字字符: \w +

  • 现在检索由(()

  • Look for 1 or more digits: "\d+" following the letter "t".
  • Then there should be a underscore "_"
  • Then look for at 1 or more word characters: "\w+"
  • Now retrieve the parts enclosed by the "( )"

因此,该问题的返回值将是跟在 t后面的数字。 (例如0或2),然后是列名的后缀(例如var1,var2,var3)

Thus the return values for this problem will be the the number following the "t" (ex. 0 or 2) and then the suffix of the column name (ex. var1, var2, var3)

这篇关于宽到长:多列,两个时间点,两组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆