除撇号外的所有空格和标点符号上的 strsplit [英] strsplit on all spaces and punctuation except apostrophes

查看:43
本文介绍了除撇号外的所有空格和标点符号上的 strsplit的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在HERE这里.我试图概括这些答案,但失败了.

I have asked related questions HERE and HERE. I tried to generalize these answers but have failed.

基本上我有一个字符串,我想分成单词、数字和任何类型的标点符号,但是,我想保留撇号.这是我尝试过的并且非常接近(我认为):

Basically I have a string I want to split into words, numbers and any sort of punctuation, yet, I want to retain the apostrophes. Here is what I've tried and I'm so close (I think):

x <- "Raptors don't like robots! I'd pay $500.00 to rid them."

strsplit(x, "(\\s+)|(?=[[:punct:]])", perl = TRUE)

## [[1]]
##  [1] "Raptors" "don"     "'"       "t"       "like"    "robots"  "!"             
##  [8] ""   "I"   "'"    "d"  "pay"     "$"       "500"     "."       "00"      "to"         
## [20] "rid"   "them"    "."  

这就是我所追求的:

## [[1]]
##  [1] "Raptors" "don't"       "like"    "robots"  "!"       ""        "I'd"      
##  [8] "pay"     "$"       "500"   "."   "00"  "to"      "rid"     "them"    "."  

虽然我想要一个基本的解决方案,但我希望看到其他解决方案(我确定有人有一个 stringr 解决方案),这使得这个问题对其他人更具有普遍性.

While I want a base solution I would like to see other solutions (I'm sure someone has a stringr solution) which makes the question more generalizable to others.

注意: R 有一个特定的正则表达式系统.您需要熟悉 R 才能回答这个问题.

推荐答案

你可以使用否定前瞻 (?!'):

You could use a negative lookahead (?!'):

strsplit(x, "(\\s+)|(?!')(?=[[:punct:]])", perl = TRUE)
#  [1] "Raptors" "don't"   "like"    "robots"  "!"       ""        "I'd"     "pay"     "$"       "500"     "."       "00"      "to"      "rid"     "them"    "."

这篇关于除撇号外的所有空格和标点符号上的 strsplit的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆