通过子集和列绑定重新排列数据帧 [英] Rearrange dataframe by subsetting and column bind
问题描述
我有以下数据框:
st < - data.frame(
se = rep :2,5),
X = rnorm(10,0,1),
Y = rnorm(10,0,2))
st $ xy< - paste X,,,st $ Y)
st < - st [c(se,xy)]
但我希望它如下:
1 2 3 4 5
-1.53697673029089,2.10652020463275 -1.02183940974772,0.623009466458354 1.33614674072657,1.5694345481646 0.270466789820086,-0.75670874554064 -0.280167896821629,-1.33313822867893
0.26012874418111,2.87972571647846 -1.32317949800031,-2.92675188421021 0.584199000313255,0.565499464846637 -0.555881716346136,-1.14460518414649 -1.0871665543915,-3.18687136890236 $ b $我的意思是当 se
的值是相同的时候,做一个列绑定。
你有什么想法可以做到这一点吗?
我没有运气与 spread(tidyr)
,我想这是涉及 sapply
, cbind
和 if
语句。因为真实数据涉及到超过35.000行。
解决方案如果我们需要将xy列元素拆分为单个单元,可以使用 cSplit
从 splitstackshape
。然后 rbind
unlist
ing`之后的交替行st1。
library(splitstackshape)
st1 < - cSplit(st,'xy',',','wide')
rbind(unlist st1 [c(TRUE,FALSE)] [, - 1,带= FALSE]),
unlist(st1 [c(FALSE,TRUE)] [, - 1,= FALSE]))
如果我们不需要 split我们可以从 data.table
将'xy'列转换为个别元素,我们可以使用 dcast
。应该足够快将'data.frame'转换为'data.table'( setDT(st)
,用'se'创建一个序列列('N'),然后用code> dcast
从'long'到'wide'。
library(data.table )
dcast(setDT(st)[,N:= 1:.N,se],se〜N,value.var ='xy')
I have the following dataframe:
st <- data.frame(
se = rep(1:2, 5),
X = rnorm(10, 0, 1),
Y = rnorm(10, 0, 2))
st$xy <- paste(st$X,",",st$Y)
st <- st[c("se","xy")]
but I want it to be the following:
1 2 3 4 5
-1.53697673029089 , 2.10652020463275 -1.02183940974772 , 0.623009466458354 1.33614674072657 , 1.5694345481646 0.270466789820086 , -0.75670874554064 -0.280167896821629 , -1.33313822867893
0.26012874418111 , 2.87972571647846 -1.32317949800031 , -2.92675188421021 0.584199000313255 , 0.565499464846637 -0.555881716346136 , -1.14460518414649 -1.0871665543915 , -3.18687136890236
I mean when the value of se
is the same, make a column bind.
Do you have any ideas how to accomplish this?
I had no luck with spread(tidyr)
, and I guess it's something which involves sapply
, cbind
and a if
statement. Because the real data involves more than 35.000 rows.
解决方案 If we need to split the 'xy' column elements into individual units, cSplit
from splitstackshape
can be used. Then rbind
the alternating rows of 'st1' after unlist
ing`.
library(splitstackshape)
st1 <- cSplit(st, 'xy', ', ', 'wide')
rbind(unlist(st1[c(TRUE,FALSE)][,-1, with=FALSE]),
unlist(st1[c(FALSE, TRUE)][,-1, with=FALSE]))
If we don't need to split
the 'xy' column into individual elements, we can use dcast
from data.table
. It should be fast enough. Convert the 'data.frame' to 'data.table' (setDT(st)
, create a sequence column ('N') by 'se', and then dcast
from 'long' to 'wide'.
library(data.table)
dcast(setDT(st)[, N:= 1:.N, se], se~N, value.var= 'xy')
这篇关于通过子集和列绑定重新排列数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!