“相反"的正则表达式结果 [英] Regular expression for the "opposite" result

查看:163
本文介绍了“相反"的正则表达式结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

采用以下字符向量x

x <- c("1     Date in the form", "2     Number of game", 
       "3     Day of week", "4-5     Visiting team and league")

我想要的结果是下面的向量,每个字符串的第一个大写单词,以及如果字符串包含-的最后一个单词.

My desired result is the following vector, with the first capitalized word from each string and, if the string contains a -, also the last word.

[1] "Date"     "Number"   "Day"      "Visiting" "league"  

所以不要这样做

unlist(sapply(strsplit(x, "[[:blank:]]+|, "), function(y){
   if(grepl("[-]", y[1])) c(y[2], tail(y,1)) else y[2] 
}))

要获得结果,我想可以尝试将其缩短为正则表达式.结果几乎与sub中此正则表达式的相反".我已经尝试了各种方法来达到相反的效果,其中包括[^A-Za-z]+的不同变体,但都没有成功.

to get the result, I figured I could try to shorten it to a regular expression. The result is almost the "opposite" of this regular expression in sub. I've tried it every which way to get the opposite, with different varieties of [^A-Za-z]+ among others, and haven't been successful.

> sub("[A-Z][a-z]+", "", x)
[1] "1      in the form"       "2      of game"           
[3] "3      of week"           "4-5      team and league"

所以我想这是一个分为两个部分的问题.

So I guess this is a two part question.

  1. sub()gsub(),如何返回"[A-Z][a-z]+"的反面?

  1. with sub() or gsub(), how can I return the opposite of "[A-Z][a-z]+"?

我该如何写正则表达式以使其类似于匹配第一个大写单词,并且如果字符串包含-,还要匹配最后一个单词." ?

How can I write the regular expression to read like "Match the first capitalized word and, if the string contains a -, also match the last word."?

推荐答案

以下是一些建议:

  1. 要使用sub提取第一个大写单词,可以使用

  1. To extract the first capitalized word with sub, you can use

sub(".*\\b([A-Z].*?)\\b.*", "\\1", x)
#[1] "Date"     "Number"   "Day"      "Visiting"

其中\\b代表单词边界.

您还可以使用一个sub命令提取所有单词,但是请注意,由于sub返回的向量的长度与输入向量x.

You can also extract all word with one sub command, but note that you have to apply an extra step because the length of the vector returned by sub is identical to the length of the input vector x.

以下正则表达式使用前瞻((?=.*-))来测试字符串中是否存在-.在这种情况下,将提取两个单词.如果不存在,则应用逻辑(|)之后的正则表达式,并仅返回第一个大写单词.

The following regular expression makes use of a lookahead ((?=.*-)) to test if there's a - in the string. If it is the case, two words are extracted. If it's not present, the regular expression after the logical or (|) is applied and returns the first capitalized word only.

res <- sub("(?:(?=.*-).*\\b([A-Z].*?\\b ).*\\b(.+)$)|(?:.*\\b([A-Z].*?)\\b.*)", 
           "\\1\\2\\3", x, perl = TRUE)
# [1] "Date"            "Number"          "Day"             "Visiting league"

为了将同一字符串中的多个单词分开,还需要执行一个附加步骤:

One additional step is necessary in order to separate multiple words in the same string:

unlist(strsplit(res, " ", fixed = TRUE))
# [1] "Date"     "Number"   "Day"      "Visiting" "league"  

这篇关于“相反"的正则表达式结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆