如何strsplit数据框架列和相应的行重复? [英] How to strsplit data frame column and replicate rows accordingly?

查看:201
本文介绍了如何strsplit数据框架列和相应的行重复?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的数据框架:

> df <- data.frame(Column1=c("id1", "id2", "id3"), Column2=c("text1,text2,text3", "text4", "text5,text6"), Column3=c("text7", "text8,text9,text10,text11", "text12,text13"))

> df
  Column1           Column2                   Column3
1     id1 text1,text2,text3                     text7
2     id2             text4 text8,text9,text10,text11
3     id3       text5,text6             text12,text13

如何以此格式转换?

  Column1 variable                     value
1     id1  Column2                     text1
2     id1  Column2                     text2
3     id1  Column2                     text3
4     id2  Column2                     text4
5     id3  Column2                     text5
6     id3  Column2                     text6
7     id1  Column3                     text7
8     id2  Column3                     text8
9     id2  Column3                     text9
10    id2  Column3                    text10
11    id2  Column3                    text11
12    id3  Column3                    text12
13    id3  Column3                    text13






我猜第一步是将 fusion()数据框(btw,我应该担心那个警告?):


I guess the first step is to melt() the data frame (btw, should I worry about that warning?):

> library(reshape2)    
> mdf <- melt(df, id.vars="Column1", measure.vars=c("Column2", "Column3"))
> mdf
  Column1 variable                     value
1     id1  Column2         text1,text2,text3
2     id2  Column2                     text4
3     id3  Column2               text5,text6
4     id1  Column3                     text7
5     id2  Column3 text8,text9,text10,text11
6     id3  Column3             text12,text13
Warning message:
attributes are not identical across measure variables; they will be dropped

然后我基本上需要``strsplit()`'''并相应地复制行,但是我无法想到这样做。

Then I would basically need to ``strsplit()` the 'value' column and replicate the rows accordingly, but I can't think of a way to do it.

> strsplit(mdf$value, ",")
[[1]]
[1] "text1" "text2" "text3"

[[2]]
[1] "text4"

[[3]]
[1] "text5" "text6"

[[4]]
[1] "text7"

[[5]]
[1] "text8"  "text9"  "text10" "text11"

[[6]]
[1] "text12" "text13"

谢谢。

推荐答案

您可以尝试:

 library(reshape2)

cSplit https://gist.github.com/mrdwab/11380733

 cSplit(melt(df, id.vars="Column1"), "value", ",", "long")
 #      Column1 variable  value
 # 1:     id1  Column2  text1
 # 2:     id1  Column2  text2
 # 3:     id1  Column2  text3
 # 4:     id2  Column2  text4
 # 5:     id3  Column2  text5
 # 6:     id3  Column2  text6
 # 7:     id1  Column3  text7
 # 8:     id2  Column3  text8
 # 9:     id2  Column3  text9
 #10:     id2  Column3 text10
 #11:     id2  Column3 text11
 #12:     id3  Column3 text12
 #13:     id3  Column3 text13






或者,如果想要坚持使用CRAN软件包中的功能:


Alternatively, if one wants to stick to functions available in CRAN packages:

library(reshape2)
library(splitstackshape)
library(dplyr)
select(na.omit(concat.split.multiple(melt(df, id.vars="Column1"), split.col="value", sep=",", direction="long")), -time)

这篇关于如何strsplit数据框架列和相应的行重复?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆