是否通过收集多列来整理数据集? [英] Tidying dataset by gathering multiple columns?

查看:0
本文介绍了是否通过收集多列来整理数据集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过这样操作数据来整理数据集:

age gender  education       previous_comp_exp   tutorial_time   qID.1    time_taken.1   qID.2    time_taken.2   
18  Male    Undergraduate   casual gamer        62.17926        sor9     39.61206       sor8     19.4892
24  Male    Undergraduate   casual gamer        85.01288        sor9     50.92343       sor8     16.15616

变成这样:

age gender  education       previous_comp_exp   tutorial_time   qID      time_taken 
18  Male    Undergraduate   casual gamer        62.17926        sor9     39.61206       
18  Male    Undergraduate   casual gamer        62.17926        sor8     19.4892
24  Male    Undergraduate   casual gamer        85.01288        sor9     50.92343       
24  Male    Undergraduate   casual gamer        85.01288        sor8     16.15616

我尝试过gather(),但我只能在一栏中使用它,并且我不断收到这样的警告:

警告消息:度量变量之间的属性不相同; 它们将被丢弃

有什么想法吗?

推荐答案

melt来自data.table(参见?patterns):

library(data.table)

melt(setDT(df), measure = patterns("^qID", "^time_taken"),
     value.name = c("qID", "time_taken"))

结果:

   age gender     education previous_comp_exp tutorial_time variable  qID time_taken
1:  18   Male Undergraduate      casual_gamer      62.17926        1 sor9   39.61206
2:  24   Male Undergraduate      casual_gamer      85.01288        1 sor9   50.92343
3:  18   Male Undergraduate      casual_gamer      62.17926        2 sor8   19.48920
4:  24   Male Undergraduate      casual_gamer      85.01288        2 sor8   16.15616

tidyr

library(dplyr)
library(tidyr)

df %>%
  gather(variable, value, qID.1:time_taken.2) %>%
  mutate(variable = sub("\.\d$", "", variable)) %>%
  group_by(variable) %>%
  mutate(ID = row_number()) %>%
  spread(variable, value, convert = TRUE) %>%
  select(-ID)

结果:

# A tibble: 4 x 7
    age gender     education previous_comp_exp tutorial_time   qID time_taken
  <int> <fctr>        <fctr>            <fctr>         <dbl> <chr>      <dbl>
1    18   Male Undergraduate      casual_gamer      62.17926  sor9   39.61206
2    18   Male Undergraduate      casual_gamer      62.17926  sor8   19.48920
3    24   Male Undergraduate      casual_gamer      85.01288  sor9   50.92343
4    24   Male Undergraduate      casual_gamer      85.01288  sor8   16.15616

注意:

对于tidyr方法,convert=TRUE用于将time_taken转换回numeric,因为gatherqID列合并时被强制为字符。

数据:

df = structure(list(age = c(18L, 24L), gender = structure(c(1L, 1L
), .Label = "Male", class = "factor"), education = structure(c(1L, 
1L), .Label = "Undergraduate", class = "factor"), previous_comp_exp = structure(c(1L, 
1L), .Label = "casual_gamer", class = "factor"), tutorial_time = c(62.17926, 
85.01288), qID.1 = structure(c(1L, 1L), .Label = "sor9", class = "factor"), 
    time_taken.1 = c(39.61206, 50.92343), qID.2 = structure(c(1L, 
    1L), .Label = "sor8", class = "factor"), time_taken.2 = c(19.4892, 
    16.15616)), .Names = c("age", "gender", "education", "previous_comp_exp", 
"tutorial_time", "qID.1", "time_taken.1", "qID.2", "time_taken.2"
), class = "data.frame", row.names = c(NA, -2L))

这篇关于是否通过收集多列来整理数据集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆