用空格分割R中的不均匀字符串 [英] Split an uneven character string in R with space

查看:132
本文介绍了用空格分割R中的不均匀字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我读了很多关于在R中分割字符串的帖子。但是,我发现一个错误,我认为是由于变量被读取到R的方式,即在某些情况下,在某些情况下的空格是因为ID较短。我试图将字符变量VESSELID分成两个新的变量:containerID和DATE。以下是我的数据集的一个子集。

 > dput(df)
结构(list(SETID = c(24153L,24187L,24215L,31990L,31990L,
31995L,31995L,31995L,31996L,31996L,31996L,31997L,31997L,
32002L,32002L,32002L,32002L,32003L,32003L,32003L),VESSELID = c(6830 2002/08/13,
6830 2002/08/12,6830 2002/08/15 105372 2002/08/23,
105372 2002/08/23,104234 2002/07/20,104234 2002/07/20,
104234 2002/07 / 20,104234 2002/07/21,104234 2002/07/21,
104234 2002/07/21,104234 2002/07/22,104234 2002/07/22 ,
5744 2002/08/14,5744 2002/08/14,5744 2002/08/14,
5744 2002/08/14,5744 2002 / 08/13,5744 2002/08/13,
5744 2002/08/13)),.Names = c(SETID,VESSELID),row.names = c(1L ,
2L,3L,10L,11L,12L,13L,14L,15L,16L,17L,18L,19L,20L,
21L,22L,23L,24L,25L,26L) data.frame)

我尝试了以下内容:

  library(reshape2)
test< - data.frame(d f,colsplit(df $ VESSELID,split =,names = c(vesselID,DATE)))

但是,我收到此错误消息:

  colsplit中的错误(log21 $ VESSELID,split = ,name = c(vesselID,DATE))
未使用参数(split =)

split 命令似乎无法正常工作。我不知道如何修复我的字符串。

解决方案

参数名称不是 split ,它是模式

  test< ;  -  data.frame(df,colsplit(df $ VESSELID,pattern =,names = c(vesselID,DATE)))

给出:

  SETID VESSELID vesselID DATE 
1 24153 6830 2002 / 08/13 6830 2002/08/13
2 24187 6830 2002/08/12 6830 2002/08/12
3 24215 6830 2002/08/15 6830 2002/08/15
10 31990 105372 2002/08/23 105372 2002/08/23
11 31990 105372 2002/08/23 105372 2002/08/23
12 31995 104234 2002/07/20 104234 2002/07/20
13 31995 104234 2002/07/20 104234 2002/07/20
14 31995 104234 2002/07/20 104234 2002/07/20
15 31996 104234 2002/07/21 104234 2002 / 07/21
16 31996 104234 2002/07/21 104234 2002/07/21
17 31996 104234 2002/07/21 104234 2002/07 / 21
18 31997 104234 2002/07/22 104234 2002/07/22
19 31997 104234 2002/07/22 104234 2002/07/22
20 32002 5744 2002/08/14 5744 2002/08/14
21 32002 5744 2002/08/14 5744 2002/08/14
22 32002 5744 2002/08/14 5744 2002/08/14
23 32002 5744 2002 / 08/14 5744 2002/08/14
24 32003 5744 2002/08/13 5744 2002/08/13
25 32003 5744 2002/08/13 5744 2002/08/13
26 32003 5744 2002/08/13 5744 2002/08/13


I read many posts on splitting strings in R. However, I am running into an error which I think is due to the way the variables were read into R i.e., space after the date in some cases because the ID is shorter. I am trying to split the character variable "VESSELID" into 2 new variables: "vesselID" and "DATE". Below is a subset of my dataset.

> dput(df)
structure(list(SETID = c(24153L, 24187L, 24215L, 31990L, 31990L, 
31995L, 31995L, 31995L, 31996L, 31996L, 31996L, 31997L, 31997L, 
32002L, 32002L, 32002L, 32002L, 32003L, 32003L, 32003L), VESSELID = c("6830 2002/08/13  ", 
"6830 2002/08/12  ", "6830 2002/08/15  ", "105372 2002/08/23", 
"105372 2002/08/23", "104234 2002/07/20", "104234 2002/07/20", 
"104234 2002/07/20", "104234 2002/07/21", "104234 2002/07/21", 
"104234 2002/07/21", "104234 2002/07/22", "104234 2002/07/22", 
"5744 2002/08/14  ", "5744 2002/08/14  ", "5744 2002/08/14  ", 
"5744 2002/08/14  ", "5744 2002/08/13  ", "5744 2002/08/13  ", 
"5744 2002/08/13  ")), .Names = c("SETID", "VESSELID"), row.names = c(1L, 
2L, 3L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 
21L, 22L, 23L, 24L, 25L, 26L), class = "data.frame")

I did try the following:

library(reshape2)
test <- data.frame(df, colsplit(df$VESSELID, split= " ",names=c("vesselID","DATE")))

However, I get this error message:

Error in colsplit(log21$VESSELID, split = " ", names = c("vesselID", "DATE")) : 
      unused argument(s) (split = " ")

The split command doesn't seem to be able to work properly. I don't know how to fix my character string.

解决方案

The argument name is not split, it is pattern :

test <- data.frame(df, colsplit(df$VESSELID, pattern = " ",names=c("vesselID","DATE")))

gives :

   SETID          VESSELID vesselID         DATE
1  24153 6830 2002/08/13       6830 2002/08/13  
2  24187 6830 2002/08/12       6830 2002/08/12  
3  24215 6830 2002/08/15       6830 2002/08/15  
10 31990 105372 2002/08/23   105372   2002/08/23
11 31990 105372 2002/08/23   105372   2002/08/23
12 31995 104234 2002/07/20   104234   2002/07/20
13 31995 104234 2002/07/20   104234   2002/07/20
14 31995 104234 2002/07/20   104234   2002/07/20
15 31996 104234 2002/07/21   104234   2002/07/21
16 31996 104234 2002/07/21   104234   2002/07/21
17 31996 104234 2002/07/21   104234   2002/07/21
18 31997 104234 2002/07/22   104234   2002/07/22
19 31997 104234 2002/07/22   104234   2002/07/22
20 32002 5744 2002/08/14       5744 2002/08/14  
21 32002 5744 2002/08/14       5744 2002/08/14  
22 32002 5744 2002/08/14       5744 2002/08/14  
23 32002 5744 2002/08/14       5744 2002/08/14  
24 32003 5744 2002/08/13       5744 2002/08/13  
25 32003 5744 2002/08/13       5744 2002/08/13  
26 32003 5744 2002/08/13       5744 2002/08/13  

这篇关于用空格分割R中的不均匀字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆