用R删除某个单词之前的字符串 [英] Remove the string before a certain word with R

查看:328
本文介绍了用R删除某个单词之前的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个需要清理的字符向量.具体来说,我想删除投票"一词前的数字.请注意,该数字以逗号分隔数千个,因此将其视为字符串更容易.

I have a character vector that I need to clean. Specifically, I want to remove the number that comes before the word "Votes." Note that the number has a comma to separate thousands, so it's easier to treat it as a string.

我知道gsub("*.Votes",",text)会删除所有内容,但是如何删除数字?另外,如何将重复的空间折叠成一个空间?

I know that gsub("*. Votes","", text) will remove everything, but how do I just remove the number? Also, how do I collapse the repeated spaces into just one space?

感谢您的帮助!

示例数据:

text <- "STATE QUESTION NO. 1                       Amendment to Title 15 of the Nevada Revised Statutes Shall Chapter 202 of the Nevada Revised Statutes be amended to prohibit, except in certain circumstances, a person from selling or transferring a firearm to another person unless a federally-licensed dealer first conducts a federal background check on the potential buyer or transferee?                    558,586 Votes"

推荐答案

您可以使用

text <- "STATE QUESTION NO. 1                       Amendment to Title 15 of the Nevada Revised Statutes Shall Chapter 202 of the Nevada Revised Statutes be amended to prohibit, except in certain circumstances, a person from selling or transferring a firearm to another person unless a federally-licensed dealer first conducts a federal background check on the potential buyer or transferee?                    558,586 Votes"
trimws(gsub("(\\s){2,}|\\d[0-9,]*\\s*(Votes)", "\\1\\2", text))
# => [1] "STATE QUESTION NO. 1 Amendment to Title 15 of the Nevada Revised Statutes Shall Chapter 202 of the Nevada Revised Statutes be amended to prohibit, except in certain circumstances, a person from selling or transferring a firearm to another person unless a federally-licensed dealer first conducts a federal background check on the potential buyer or transferee? Votes"

请参见在线R演示 详细信息

  • (\\s){2,}-匹配2个或更多空白字符,同时捕获将使用替换模式中的\1占位符重新插入的最后一次出现
  • |-或
  • \\d-一个数字
  • [0-9,]*-0个或多个数字或逗号
  • \\s*-0 +空格字符
  • (Votes)-第2组(将使用\2占位符在输出中恢复):一个Votes子字符串.
  • (\\s){2,} - matches 2 or more whitespace chars while capturing the last occurrence that will be reinserted using the \1 placeholder in the replacement pattern
  • | - or
  • \\d - a digit
  • [0-9,]* - 0 or more digits or commas
  • \\s* - 0+ whitespace chars
  • (Votes) - Group 2 (will be restored in the output using the \2 placeholder): a Votes substring.

请注意,trimws将删除所有前导/后缀空格.

Note that trimws will remove any leading/trailing whitespace.

这篇关于用R删除某个单词之前的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆