R中最后一个逗号分割的字符串 [英] string split on last comma in R

查看:332
本文介绍了R中最后一个逗号分割的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对R并不陌生,但对正则表达式却相对较新。



此处



例如,如果我使用

 > strsplit(英国,美国,德国,,)
[[1]]
[1]英国美国德国

但我想得到

  [[1 ]] 
[1]英国,美国德国

另一个例子是

 > strsplit(伦敦,华盛顿特区,柏林,,)
[[1]]
[1]伦敦华盛顿特区 柏林

我想得到

  [[1]] 
[1]华盛顿特区伦敦 柏林

绝对华盛顿特区不应该分为两个部分部分,然后仅按最后一个逗号进行拆分,。

我认为一种可行的方法是用其他方式替换最后一个逗号例如

  $,#,*,... 

然后使用

  strsplit()

将字符串拆分为替换的字符串(请确保它是唯一的!),但是如果您能处理,我会更高兴直接使用一些内置函数的问题。



那我该怎么办?非常感谢

解决方案

这里是一种方法:

  strsplit(英国,美国,德国,,(?= [^,] + $),perl = TRUE)

## [[1]]
## [1]英国,美国德国

您可能想要:

  strsplit( UK,USA,Germany,,\\s *(?= [^,] + $) ,perl = TRUE)

## [[1]]
## [1]英国,美国德国

因为如果逗号后没有空格,它将匹配:

  strsplit(c( UK,USA,Germany, UK,USA,Germany),,\\s *(?= [^,] + $),perl = TRUE)

## [[1]]
## [1]英国,美国德国
##
## [[2]]
## [1]英国,美国德国


I'm not new to R but I am relative new to regular expression.

A similar question can be found in here.

An example is if I use

> strsplit("UK, USA, Germany", ", ")
[[1]]
[1] "UK"      "USA"     "Germany"

but I want to get

[[1]]
[1] "UK, USA"     "Germany"

Another example is

> strsplit("London, Washington, D.C., Berlin", ", ")
[[1]]
[1] "London"     "Washington" "D.C."       "Berlin"  

and I want to get

[[1]]
[1] "London, Washington, D.C."       "Berlin"  

Definitely Washington, D.C. should not be not divided into two parts, and split only by the last comma, not every comma.

One viable way I think is to replace the last comma by something else such as

$, #, *, ...

then use

strsplit() 

to split the string by the one you replaced (Make sure it is unique!), but I'm more happy if you can deal with the problem using some built in function directly.

So how can I do that? many thanks

解决方案

Here's one approach:

strsplit("UK, USA, Germany", ",(?=[^,]+$)", perl=TRUE)

## [[1]]
## [1] "UK, USA" " Germany"

You may want:

strsplit("UK, USA, Germany", ",\\s*(?=[^,]+$)", perl=TRUE)

## [[1]]
## [1] "UK, USA" "Germany"

As it will match if there is no space after the comma:

strsplit(c("UK, USA, Germany", "UK, USA,Germany"), ",\\s*(?=[^,]+$)", perl=TRUE)

## [[1]]
## [1] "UK, USA" "Germany"
## 
## [[2]]
## [1] "UK, USA" "Germany"

这篇关于R中最后一个逗号分割的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆