将宽格式重塑为多列长格式 [英] Reshape wide format, to multi-column long format
本文介绍了将宽格式重塑为多列长格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我要重塑具有在3个时间点测量的多个测试的宽格式数据集:
ID Test Year Fall Spring Winter
1 1 2008 15 16 19
1 1 2009 12 13 27
1 2 2008 22 22 24
1 2 2009 10 14 20
2 1 2008 12 13 25
2 1 2009 16 14 21
2 2 2008 13 11 29
2 2 2009 23 20 26
3 1 2008 11 12 22
3 1 2009 13 11 27
3 2 2008 17 12 23
3 2 2009 14 9 31
转换为数据集,该数据集按列分隔测试,但将测量时间转换为长时间格式,对于每个新列如下:
ID Year Time Test1 Test2
1 2008 Fall 15 22
1 2008 Spring 16 22
1 2008 Winter 19 24
1 2009 Fall 12 10
1 2009 Spring 13 14
1 2009 Winter 27 20
2 2008 Fall 12 13
2 2008 Spring 13 11
2 2008 Winter 25 29
2 2009 Fall 16 23
2 2009 Spring 14 20
2 2009 Winter 21 26
3 2008 Fall 11 17
3 2008 Spring 12 12
3 2008 Winter 22 23
3 2009 Fall 13 14
3 2009 Spring 11 9
3 2009 Winter 27 31
我曾尝试使用重塑和熔化,但未获成功。现有的帖子解决了转换为单列结果的问题。
推荐答案
使用reshape2
:
# Thanks to Ista for helping with direct naming using "variable.name"
df.m <- melt(df, id.var = c("ID", "Test", "Year"), variable.name = "Time")
df.m <- transform(df.m, Test = paste0("Test", Test))
dcast(df.m, ID + Year + Time ~ Test, value.var = "value")
更新:使用data.table熔化/强制转换版本>=1.9.0:
data.table
从1.9.0版开始导入reshape2
包,并用C语言为data.table实现快速的melt
和dcast
方法。大数据量的速度比较如下所示。
有关新闻的更多信息,请访问here。
require(data.table) ## ver. >=1.9.0
require(reshape2)
dt <- as.data.table(df, key=c("ID", "Test", "Year"))
dt.m <- melt(dt, id.var = c("ID", "Test", "Year"), variable.name = "Time")
dt.m[, Test := paste0("Test", Test)]
dcast.data.table(dt.m, ID + Year + Time ~ Test, value.var = "value")
目前,您必须显式编写dcast.data.table
,因为它还不是reshape2
中的S3泛型。
大数据标杆:
# generate data:
set.seed(45L)
DT <- data.table(ID = sample(1e2, 1e7, TRUE),
Test = sample(1e3, 1e7, TRUE),
Year = sample(2008:2014, 1e7,TRUE),
Fall = sample(50, 1e7, TRUE),
Spring = sample(50, 1e7,TRUE),
Winter = sample(50, 1e7, TRUE))
DF <- as.data.frame(DT)
整形2计时:
reshape2_melt <- function(df) {
df.m <- melt(df, id.var = c("ID", "Test", "Year"), variable.name = "Time")
}
# min. of three consecutive runs
system.time(df.m <- reshape2_melt(DF))
# user system elapsed
# 43.319 4.909 48.932
df.m <- transform(df.m, Test = paste0("Test", Test))
reshape2_cast <- function(df) {
dcast(df.m, ID + Year + Time ~ Test, value.var = "value")
}
# min. of three consecutive runs
system.time(reshape2_cast(df.m))
# user system elapsed
# 57.728 9.712 69.573
data.table时间:
DT_melt <- function(dt) {
dt.m <- melt(dt, id.var = c("ID", "Test", "Year"), variable.name = "Time")
}
# min. of three consecutive runs
system.time(dt.m <- reshape2_melt(DT))
# user system elapsed
# 0.276 0.001 0.279
dt.m[, Test := paste0("Test", Test)]
DT_cast <- function(dt) {
dcast.data.table(dt.m, ID + Year + Time ~ Test, value.var = "value")
}
# min. of three consecutive runs
system.time(DT_cast(dt.m))
# user system elapsed
# 12.732 0.825 14.006
melt.data.table
比reshape2:::melt
快~175倍,dcast.data.table
比reshape2:::dcast
快~5倍。
这篇关于将宽格式重塑为多列长格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文