除撇号外的所有空格和标点符号上的 strsplit [英] strsplit on all spaces and punctuation except apostrophes
本文介绍了除撇号外的所有空格和标点符号上的 strsplit的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
I have asked related questions HERE and HERE. I tried to generalize these answers but have failed.
基本上我有一个字符串,我想分成单词、数字和任何类型的标点符号,但是,我想保留撇号.这是我尝试过的并且非常接近(我认为):
Basically I have a string I want to split into words, numbers and any sort of punctuation, yet, I want to retain the apostrophes. Here is what I've tried and I'm so close (I think):
x <- "Raptors don't like robots! I'd pay $500.00 to rid them."
strsplit(x, "(\\s+)|(?=[[:punct:]])", perl = TRUE)
## [[1]]
## [1] "Raptors" "don" "'" "t" "like" "robots" "!"
## [8] "" "I" "'" "d" "pay" "$" "500" "." "00" "to"
## [20] "rid" "them" "."
这就是我所追求的:
## [[1]]
## [1] "Raptors" "don't" "like" "robots" "!" "" "I'd"
## [8] "pay" "$" "500" "." "00" "to" "rid" "them" "."
虽然我想要一个基本的解决方案,但我希望看到其他解决方案(我确定有人有一个 stringr 解决方案),这使得这个问题对其他人更具有普遍性.
While I want a base solution I would like to see other solutions (I'm sure someone has a stringr solution) which makes the question more generalizable to others.
注意: R 有一个特定的正则表达式系统.您需要熟悉 R 才能回答这个问题.
推荐答案
你可以使用否定前瞻 (?!')
:
You could use a negative lookahead (?!')
:
strsplit(x, "(\\s+)|(?!')(?=[[:punct:]])", perl = TRUE)
# [1] "Raptors" "don't" "like" "robots" "!" "" "I'd" "pay" "$" "500" "." "00" "to" "rid" "them" "."
这篇关于除撇号外的所有空格和标点符号上的 strsplit的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文