是否通过收集多列来整理数据集? [英] Tidying dataset by gathering multiple columns?
本文介绍了是否通过收集多列来整理数据集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想通过这样操作数据来整理数据集:
age gender education previous_comp_exp tutorial_time qID.1 time_taken.1 qID.2 time_taken.2
18 Male Undergraduate casual gamer 62.17926 sor9 39.61206 sor8 19.4892
24 Male Undergraduate casual gamer 85.01288 sor9 50.92343 sor8 16.15616
变成这样:
age gender education previous_comp_exp tutorial_time qID time_taken
18 Male Undergraduate casual gamer 62.17926 sor9 39.61206
18 Male Undergraduate casual gamer 62.17926 sor8 19.4892
24 Male Undergraduate casual gamer 85.01288 sor9 50.92343
24 Male Undergraduate casual gamer 85.01288 sor8 16.15616
我尝试过gather()
,但我只能在一栏中使用它,并且我不断收到这样的警告:
警告消息:度量变量之间的属性不相同; 它们将被丢弃
有什么想法吗?
推荐答案
与melt
来自data.table
(参见?patterns
):
library(data.table)
melt(setDT(df), measure = patterns("^qID", "^time_taken"),
value.name = c("qID", "time_taken"))
结果:
age gender education previous_comp_exp tutorial_time variable qID time_taken
1: 18 Male Undergraduate casual_gamer 62.17926 1 sor9 39.61206
2: 24 Male Undergraduate casual_gamer 85.01288 1 sor9 50.92343
3: 18 Male Undergraduate casual_gamer 62.17926 2 sor8 19.48920
4: 24 Male Undergraduate casual_gamer 85.01288 2 sor8 16.15616
或tidyr
:
library(dplyr)
library(tidyr)
df %>%
gather(variable, value, qID.1:time_taken.2) %>%
mutate(variable = sub("\.\d$", "", variable)) %>%
group_by(variable) %>%
mutate(ID = row_number()) %>%
spread(variable, value, convert = TRUE) %>%
select(-ID)
结果:
# A tibble: 4 x 7
age gender education previous_comp_exp tutorial_time qID time_taken
<int> <fctr> <fctr> <fctr> <dbl> <chr> <dbl>
1 18 Male Undergraduate casual_gamer 62.17926 sor9 39.61206
2 18 Male Undergraduate casual_gamer 62.17926 sor8 19.48920
3 24 Male Undergraduate casual_gamer 85.01288 sor9 50.92343
4 24 Male Undergraduate casual_gamer 85.01288 sor8 16.15616
注意:
对于tidyr
方法,convert=TRUE
用于将time_taken
转换回numeric
,因为gather
在qID
列合并时被强制为字符。
数据:
df = structure(list(age = c(18L, 24L), gender = structure(c(1L, 1L
), .Label = "Male", class = "factor"), education = structure(c(1L,
1L), .Label = "Undergraduate", class = "factor"), previous_comp_exp = structure(c(1L,
1L), .Label = "casual_gamer", class = "factor"), tutorial_time = c(62.17926,
85.01288), qID.1 = structure(c(1L, 1L), .Label = "sor9", class = "factor"),
time_taken.1 = c(39.61206, 50.92343), qID.2 = structure(c(1L,
1L), .Label = "sor8", class = "factor"), time_taken.2 = c(19.4892,
16.15616)), .Names = c("age", "gender", "education", "previous_comp_exp",
"tutorial_time", "qID.1", "time_taken.1", "qID.2", "time_taken.2"
), class = "data.frame", row.names = c(NA, -2L))
这篇关于是否通过收集多列来整理数据集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文