用R删除某个单词之前的字符串 [英] Remove the string before a certain word with R
本文介绍了用R删除某个单词之前的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个需要清理的字符向量.具体来说,我想删除投票"一词前的数字.请注意,该数字以逗号分隔数千个,因此将其视为字符串更容易.
I have a character vector that I need to clean. Specifically, I want to remove the number that comes before the word "Votes." Note that the number has a comma to separate thousands, so it's easier to treat it as a string.
我知道gsub("*.Votes",",text)会删除所有内容,但是如何删除数字?另外,如何将重复的空间折叠成一个空间?
I know that gsub("*. Votes","", text) will remove everything, but how do I just remove the number? Also, how do I collapse the repeated spaces into just one space?
感谢您的帮助!
示例数据:
text <- "STATE QUESTION NO. 1 Amendment to Title 15 of the Nevada Revised Statutes Shall Chapter 202 of the Nevada Revised Statutes be amended to prohibit, except in certain circumstances, a person from selling or transferring a firearm to another person unless a federally-licensed dealer first conducts a federal background check on the potential buyer or transferee? 558,586 Votes"
推荐答案
您可以使用
text <- "STATE QUESTION NO. 1 Amendment to Title 15 of the Nevada Revised Statutes Shall Chapter 202 of the Nevada Revised Statutes be amended to prohibit, except in certain circumstances, a person from selling or transferring a firearm to another person unless a federally-licensed dealer first conducts a federal background check on the potential buyer or transferee? 558,586 Votes"
trimws(gsub("(\\s){2,}|\\d[0-9,]*\\s*(Votes)", "\\1\\2", text))
# => [1] "STATE QUESTION NO. 1 Amendment to Title 15 of the Nevada Revised Statutes Shall Chapter 202 of the Nevada Revised Statutes be amended to prohibit, except in certain circumstances, a person from selling or transferring a firearm to another person unless a federally-licensed dealer first conducts a federal background check on the potential buyer or transferee? Votes"
-
(\\s){2,}
-匹配2个或更多空白字符,同时捕获将使用替换模式中的\1
占位符重新插入的最后一次出现 -
|
-或 -
\\d
-一个数字 -
[0-9,]*
-0个或多个数字或逗号 -
\\s*
-0 +空格字符 -
(Votes)
-第2组(将使用\2
占位符在输出中恢复):一个Votes
子字符串.
(\\s){2,}
- matches 2 or more whitespace chars while capturing the last occurrence that will be reinserted using the\1
placeholder in the replacement pattern|
- or\\d
- a digit[0-9,]*
- 0 or more digits or commas\\s*
- 0+ whitespace chars(Votes)
- Group 2 (will be restored in the output using the\2
placeholder): aVotes
substring.
请注意,trimws
将删除所有前导/后缀空格.
Note that trimws
will remove any leading/trailing whitespace.
这篇关于用R删除某个单词之前的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文