将数据帧字符串列分割成多个不同的列 [英] Splitting a dataframe string column into multiple different columns
问题描述
我想要完成的是将列分割成多个列。我更喜欢第一列包含F,第二列US,第三CA6或DL,第四列为Z13或U13等。我的整个df遵循相同的模式X.XX.XXXX.XXX或X.XX.XXX.XXX或X.XX.XX.XXX,我知道第三列是我的问题所在,因为不同的长度。我以前只使用substr,我可以在这里使用一些if语句,但是想学习如何使用stringr包和POSIX来做到这一点(除非有更好的选择)。谢谢你提前。
What I am trying to accomplish is splitting a column into multiple columns. I would prefer the first column to contain "F", second column "US", third "CA6" or "DL", and the fourth to be "Z13" or "U13" etc etc. My entire df follows the same pattern of X.XX.XXXX.XXX or X.XX.XXX.XXX or X.XX.XX.XXX and I know the third column is where my problem lies because of the different lengths. I have only used substr in the past and I could use that here with some if statements but would like to learn how to use stringr package and POSIX to do this (unless there is a better option). Thank you in advance.
这是我的df:
c("F.US.CLE.V13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13",
"F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13", "F.US.CA6.U13",
"F.US.DL.U13", "F.US.DL.U13", "F.US.DL.U13", "F.US.DL.Z13", "F.US.DL.Z13"
)
推荐答案
一个非常直接的方法是使用 read.table
你的人物矢量:
A very direct way is to just use read.table
on your character vector:
> read.table(text = text, sep = ".", colClasses = "character")
V1 V2 V3 V4
1 F US CLE V13
2 F US CA6 U13
3 F US CA6 U13
4 F US CA6 U13
5 F US CA6 U13
6 F US CA6 U13
7 F US CA6 U13
8 F US CA6 U13
9 F US DL U13
10 F US DL U13
11 F US DL U13
12 F US DL Z13
13 F US DL Z13
colClasses
需要指定,否则 F
转换为 FALSE
(这是我需要解决的splitstackshape,否则我会建议:) :)
colClasses
needs to be specified, otherwise F
gets converted to FALSE
(which is something I need to fix in "splitstackshape", otherwise I would have recommended that :) )
或者,您可以使用我的 cSplit
功能,如下所示:
Alternatively, you can use my cSplit
function, like this:
cSplit(as.data.table(text), "text", ".")
# text_1 text_2 text_3 text_4
# 1: F US CLE V13
# 2: F US CA6 U13
# 3: F US CA6 U13
# 4: F US CA6 U13
# 5: F US CA6 U13
# 6: F US CA6 U13
# 7: F US CA6 U13
# 8: F US CA6 U13
# 9: F US DL U13
# 10: F US DL U13
# 11: F US DL U13
# 12: F US DL Z13
# 13: F US DL Z13
或从tidyr分开
,如下所示:
library(dplyr)
library(tidyr)
as.data.frame(text) %>% separate(text, into = paste("V", 1:4, sep = "_"))
# V_1 V_2 V_3 V_4
# 1 F US CLE V13
# 2 F US CA6 U13
# 3 F US CA6 U13
# 4 F US CA6 U13
# 5 F US CA6 U13
# 6 F US CA6 U13
# 7 F US CA6 U13
# 8 F US CA6 U13
# 9 F US DL U13
# 10 F US DL U13
# 11 F US DL U13
# 12 F US DL Z13
# 13 F US DL Z13
这篇关于将数据帧字符串列分割成多个不同的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!