在R中分割字串 [英] Splitting strings in R

查看:117
本文介绍了在R中分割字串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下一行

    x<-"CUST_Id_8Name:Mr.Praveen KumarDOB:Mother's Name:Contact Num:Email address:Owns Car:Products held with Bank:Company Name:Salary per. month:Background:"

我想提取"CUST_Id_8","Praveen Kumar先生"以及在DOB之后写的任何东西:母亲的名字:Contact Num:等等,存储在诸如客户ID,姓名,DOB等变量中.

I want to extract "CUST_Id_8", "Mr. Praveen Kumar" and anything written after DOB: Mother's name: Contact Num: and so on stored in variables like Customer Id, Name, DOB and so on.

请帮助.

我用过

    strsplit(x, ":")

但是结果是一个包含文本的列表.但是,如果变量名后面没有任何内容,则需要空格.

But the result is a list containing the texts. But I need blanks if there is nothing after the variable name.

any1能否告诉您如何提取两个单词之间的字符串.就像我想在Name:和DOB之间提取"Praveen Kumar先生"

Can any1 tell how to extract the string between two words. Like if I want to extract "Mr. Praveen Kumar" between Name: and DOB

推荐答案

您可以使用regexecregmatches提取各种数据项作为子字符串.这是一个可行的示例:

You can use regexec and regmatches to pull out the various data items as substrings. Here's a worked example:

样本数据

txt <- c("CUST_Id_8Name:Mr.Praveen KumarDOB:Mother's Name:Contact Num:Email address:Owns Car:Products held with Bank:Company Name:Salary per. month:Background:",
         "CUST_Id_15Name:Mr.Joe JohnsonDOB:01/02/1973Mother's Name:BarbaraContact Num:0123 456789Email address:joe@joesville.comOwns Car:YesProducts held with Bank:Savings, CurrentCompany Name:Joes villeSalary per. month:$100000Background:shady")

要匹配的模式:

pattern <- "CUST_Id_(.*)Name:(.*)DOB:(.*)Mother's Name:(.*)Contact Num:(.*)Email address:(.*)Owns Car:(.*)Products held with Bank:(.*)Company Name:(.*)Salary per. month:(.*)Background:(.*)"
var_names <- strsplit(pattern, "[:_]\\(\\.\\*\\)")[[1]]

进行比赛:

data <- as.data.frame(do.call("rbind", regmatches(txt, regexec(pattern, txt))))[, -1]
colnames(data) <- var_names

输出:

#  CUST_Id             Name        DOB Mother's Name Contact Num
#1       8 Mr.Praveen Kumar                                     
#2      15   Mr.Joe Johnson 01/02/1973       Barbara 0123 456789
#      Email address Owns Car Products held with Bank Company Name
#1                                                                
#2 joe@joesville.com      Yes        Savings, Current   Joes ville
#  Salary per. month Background
#1                             
#2           $100000      shady

这篇关于在R中分割字串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆