R:从宽到长转换,不排序列 [英] R: Transform from Wide to Long without sorting columns
问题描述
我想将数据格式从宽格式转换为长格式。
这里是一个玩具示例:
mydata< - data.frame(ID = 1:5,ZA_1 = 1:5,
ZA_2 = 5:1,BB_1 = rep ,BB_2 = rep(6,5),CC_7 = 6:2)
ID ZA_1 ZA_2 BB_1 BB_2 CC_7
1 1 5 3 6 6
2 2 4 3 6 5
3 3 3 3 6 4
4 4 2 3 6 3
5 5 1 3 6 2
有一些变量将保持原样(这里只有ID),有些变量将被转换为长格式(这里所有其他变量,以_1,_2或_7结尾)
为了将其转换为长格式我使用data.table熔化和dcast,一个通用的方法能够自动检测变量。
library(data.table)
setDT(mydata)
idvars = grep(_ [1-7] $,names(mydata),invert = TRUE)
temp< - melt(mydata,id.vars = idvars)
nuevo < b $ b temp [,`:=`(var = sub(_ [1-7] $,'',variable),
measure = sub('。* _','' ,variable = NULL)],
...〜var,value.var ='value')
ID测量BB CC ZA
1 1 3 NA 1
1 2 6 NA 5
1 7 NA 6 NA
2 1 3 NA 2
2 2 6 NA 4
2 7 NA 5 NA
3 1 3 NA 3
3 2 6 NA 3
3 7 NA 4 NA
4 1 3 NA 4
4 2 6 NA 2
4 7 NA 3 NA
5 1 3 NA 5
5 2 6 NA 1
5 7 NA 2 NA
正如你所看到的,列按字母顺序排列,但我希望尽可能保持原始顺序,例如考虑变量第一次出现的顺序。 / p>
ID ZA_1 ZA_2 BB_1 BB_2 CC_7 p>
应为
ID ZA BB CC
我不介意如果idvars列在开头全部一起, ZA_2 TEMP BB_1 BB_2 CC_2 CC_1 p>
将是
ID ZA TEMP BB CC
或
ID TEMP ZA BB CC
我更喜欢最后一个选项。
另外一个问题是,一切都变成了字符。找到方式,修改我的初始解决方案
mydata< - data.table(ID = 1:5,ZA_2001 = 5,ZA_2002 = 5:1,
BB_2001 = rep(3,5),BB_2002 = rep(6,5),CC_2007 = 6:2)
idvars = grep [0-9] [0-9] $,names(mydata),invert = TRUE)
temp < - melt(mydata,id.vars = idvars)
temp [,`:= `(var = sub(_ 20 [0-9] [0-9] $,'',variable),
measure = sub('。* _','',variable) )]
temp [,var:= factor(var,levels = unique(var))]
dcast(temp,...〜var,value.var ='value')
它给你正确的度量值。
无论如何,这个解决方案需要大量的内存。
诀窍是将var变量转换为factor,如mtoto所示。
mtoto解决方案很好,因为它不需要转换和融化,只有融化,但不能在我更新的示例中工作,只有当每个单词的数字变体数相同时才有效。
I want to convert a dataframe from wide format to long format.
Here it is a toy example:
mydata <- data.frame(ID=1:5, ZA_1=1:5,
ZA_2=5:1,BB_1=rep(3,5),BB_2=rep(6,5),CC_7=6:2)
ID ZA_1 ZA_2 BB_1 BB_2 CC_7
1 1 5 3 6 6
2 2 4 3 6 5
3 3 3 3 6 4
4 4 2 3 6 3
5 5 1 3 6 2
There are some variables that will remain as is (here only ID) and some that will be transformed to long format (here all other variables, all ending with _1, _2 or _7)
In order to transform it to long format I'm using data.table melt and dcast, a generic way able to detect the variables automatically. Other solutions are welcome too.
library(data.table)
setDT(mydata)
idvars = grep("_[1-7]$",names(mydata) , invert = TRUE)
temp <- melt(mydata, id.vars = idvars)
nuevo <- dcast(
temp[, `:=`(var = sub("_[1-7]$", '', variable),
measure = sub('.*_', '', variable), variable = NULL)],
... ~ var, value.var='value')
ID measure BB CC ZA
1 1 3 NA 1
1 2 6 NA 5
1 7 NA 6 NA
2 1 3 NA 2
2 2 6 NA 4
2 7 NA 5 NA
3 1 3 NA 3
3 2 6 NA 3
3 7 NA 4 NA
4 1 3 NA 4
4 2 6 NA 2
4 7 NA 3 NA
5 1 3 NA 5
5 2 6 NA 1
5 7 NA 2 NA
As you can see the columns are reoredered alphabetically, but I would prefer to keep the original order as far as possible, for example taking into account the order of the first appearance of the variable.
ID ZA_1 ZA_2 BB_1 BB_2 CC_7
Should be
ID ZA BB CC
I don't mind if the idvars columns come alltogether at the beginning or if they also stay in their original position.
ID ZA_1 ZA_2 TEMP BB_1 BB_2 CC_2 CC_1
would be
ID ZA TEMP BB CC
or
ID TEMP ZA BB CC
I prefer the last option.
Another problem is that everything gets transformed to character.
Finally I've found the way, modifying my initial solution
mydata <- data.table(ID=1:5, ZA_2001=1:5, ZA_2002=5:1,
BB_2001=rep(3,5),BB_2002=rep(6,5),CC_2007=6:2)
idvars = grep("_20[0-9][0-9]$",names(mydata) , invert = TRUE)
temp <- melt(mydata, id.vars = idvars)
temp[, `:=`(var = sub("_20[0-9][0-9]$", '', variable),
measure = sub('.*_', '', variable), variable = NULL)]
temp[,var:=factor(var, levels=unique(var))]
dcast( temp, ... ~ var, value.var='value' )
And it gives you the proper measure values. Anyway this solution needs a lot of memory.
The trick was converting the var variable to factor specifying the order I want with levels, as mtoto did. mtoto solution is nice because it doesn't need to cast and melt, only melt, but doesn't work in my updated example, only works when there are the same number of number variations for each word.
这篇关于R:从宽到长转换,不排序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!