通过子集和列绑定重新排列数据帧 [英] Rearrange dataframe by subsetting and column bind

查看:153
本文介绍了通过子集和列绑定重新排列数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框:

  st < -  data.frame(
se = rep :2,5),
X = rnorm(10,0,1),
Y = rnorm(10,0,2))
st $ xy< - paste X,,,st $ Y)
st < - st [c(se,xy)]

但我希望它如下:

  1 2 3 4 5 
-1.53​​697673029089,2.10652020463275 -1.02183940974772,0.623009466458354 1.33614674072657,1.5694345481646 0.270466789820086,-0.75670874554064 -0.280167896821629,-1.33313822867893
0.26012874418111,2.87972571647846 -1.32317949800031,-2.92675188421021 0.584199000313255,0.565499464846637 -0.555881716346136,-1.14460518414649 -1.0871665543915,-3.18687136890236 $ b $我的意思是当 se 的值是相同的时候,做一个列绑定。



你有什么想法可以做到这一点吗?
我没有运气与 spread(tidyr),我想这是涉及 sapply cbind if 语句。因为真实数据涉及到超过35.000行。

解决方案

如果我们需要将xy列元素拆分为单个单元,可以使用 cSplit splitstackshape 。然后 rbind unlist ing`之后的交替行st1。

  library(splitstackshape)
st1 < - cSplit(st,'xy',',','wide')
rbind(unlist st1 [c(TRUE,FALSE)] [, - 1,带= FALSE]),
unlist(st1 [c(FALSE,TRUE)] [, - 1,= FALSE]))






如果我们不需要 split我们可以从 data.table 将'xy'列转换为个别元素,我们可以使用 dcast 。应该足够快将'data.frame'转换为'data.table'( setDT(st),用'se'创建一个序列列('N'),然后用code> dcast 从'long'到'wide'。

  library(data.table )
dcast(setDT(st)[,N:= 1:.N,se],se〜N,value.var ='xy')


I have the following dataframe:

st <- data.frame(
      se = rep(1:2, 5),
      X = rnorm(10, 0, 1),
      Y = rnorm(10, 0, 2))
st$xy <- paste(st$X,",",st$Y)
st <- st[c("se","xy")]

but I want it to be the following:

1   2   3   4   5
-1.53697673029089 , 2.10652020463275    -1.02183940974772 , 0.623009466458354   1.33614674072657 , 1.5694345481646  0.270466789820086 , -0.75670874554064   -0.280167896821629 , -1.33313822867893
0.26012874418111 , 2.87972571647846 -1.32317949800031 , -2.92675188421021   0.584199000313255 , 0.565499464846637   -0.555881716346136 , -1.14460518414649  -1.0871665543915 , -3.18687136890236

I mean when the value of se is the same, make a column bind.

Do you have any ideas how to accomplish this? I had no luck with spread(tidyr), and I guess it's something which involves sapply, cbind and a if statement. Because the real data involves more than 35.000 rows.

解决方案

If we need to split the 'xy' column elements into individual units, cSplit from splitstackshape can be used. Then rbind the alternating rows of 'st1' after unlisting`.

library(splitstackshape)
st1 <- cSplit(st, 'xy', ', ', 'wide')
 rbind(unlist(st1[c(TRUE,FALSE)][,-1, with=FALSE]), 
    unlist(st1[c(FALSE, TRUE)][,-1, with=FALSE]))


If we don't need to split the 'xy' column into individual elements, we can use dcast from data.table. It should be fast enough. Convert the 'data.frame' to 'data.table' (setDT(st), create a sequence column ('N') by 'se', and then dcast from 'long' to 'wide'.

library(data.table)
dcast(setDT(st)[, N:= 1:.N, se], se~N, value.var= 'xy')

这篇关于通过子集和列绑定重新排列数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆