在列中分割文本并添加行号 [英] splitting text in column and add row number
问题描述
我想在数据框架列中分割一些文本,并将其与行号或ID列一起保存到数据框中。
我通常使用plyr这样做,但这不再适用于dplyr。
如果我理解正确,那么plyr和我的代码中的错误更是一个错误,因为它是一个
所以我正在寻找正确的方法。
这是一个最小的例子plyr:
library(plyr)
/ pre>
set.seed(1)
df< - data。 frame(a = seq(2),
b = c(paste(sample(letters,3),collapse =';'),
paste(sample(letters,3),collapse =';' )),
stringsAsFactors = FALSE)
ddply(df,。(a),总结,unlist(strsplit(b,';')))
它会转动原始数据框架:
ab
1 1 g; j; n
2 2 x; f; v
进入:
a ..1
1 1 g
2 1 j
3 1 n
4 2 x
5 2 f
6 2 v
什么是正确的dplyr解决方案? >
解决方案我从splitstackshape包中偏向于
cSplit
但您可能会对tidyr与dplyr结合起来感兴趣unnest
:library(dplyr)
库(tidyr)
df%>%
mutate(b = strsplit(b,;))%>%
不要(b)
#ab
#1 1 g
#2 1 j
#3 1 n
#4 2 x
#5 2 f
#6 2 v
I would like to split some text in a data frame column and save it into a data frame together with the row number or an id column.
I normally used plyr to do that, but this is no longer working in dplyr.
If I understand it correctly, it is more a bug in plyr and my code works since it is a bug.
So I am looking for the correct way to do this.
This is a minimal example in plyr:
library(plyr) set.seed(1) df <- data.frame(a=seq(2), b=c(paste(sample(letters,3), collapse=';'), paste(sample(letters,3), collapse=';')), stringsAsFactors=FALSE) ddply(df,.(a),summarise,unlist(strsplit(b,';')))
It turns the original data frame:
a b 1 1 g;j;n 2 2 x;f;v
Into this:
a ..1 1 1 g 2 1 j 3 1 n 4 2 x 5 2 f 6 2 v
What would be the correct dplyr solution?
解决方案I'm biased in favor of
cSplit
from the "splitstackshape" package, but you might be interested inunnest
from "tidyr" in conjunction with "dplyr":library(dplyr) library(tidyr) df %>% mutate(b = strsplit(b, ";")) %>% unnest(b) # a b # 1 1 g # 2 1 j # 3 1 n # 4 2 x # 5 2 f # 6 2 v
这篇关于在列中分割文本并添加行号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!