“相反"的正则表达式结果 [英] Regular expression for the "opposite" result
问题描述
采用以下字符向量x
x <- c("1 Date in the form", "2 Number of game",
"3 Day of week", "4-5 Visiting team and league")
我想要的结果是下面的向量,每个字符串的第一个大写单词,以及如果字符串包含-
的最后一个单词.
My desired result is the following vector, with the first capitalized word from each string and, if the string contains a -
, also the last word.
[1] "Date" "Number" "Day" "Visiting" "league"
所以不要这样做
unlist(sapply(strsplit(x, "[[:blank:]]+|, "), function(y){
if(grepl("[-]", y[1])) c(y[2], tail(y,1)) else y[2]
}))
要获得结果,我想可以尝试将其缩短为正则表达式.结果几乎与sub
中此正则表达式的相反".我已经尝试了各种方法来达到相反的效果,其中包括[^A-Za-z]+
的不同变体,但都没有成功.
to get the result, I figured I could try to shorten it to a regular expression. The result is almost the "opposite" of this regular expression in sub
. I've tried it every which way to get the opposite, with different varieties of [^A-Za-z]+
among others, and haven't been successful.
> sub("[A-Z][a-z]+", "", x)
[1] "1 in the form" "2 of game"
[3] "3 of week" "4-5 team and league"
所以我想这是一个分为两个部分的问题.
So I guess this is a two part question.
-
和
sub()
或gsub()
,如何返回"[A-Z][a-z]+"
的反面?
with
sub()
orgsub()
, how can I return the opposite of"[A-Z][a-z]+"
?
我该如何写正则表达式以使其类似于匹配第一个大写单词,并且如果字符串包含-
,还要匹配最后一个单词." ?
How can I write the regular expression to read like "Match the first capitalized word and, if the string contains a -
, also match the last word."?
推荐答案
以下是一些建议:
-
要使用
sub
提取第一个大写单词,可以使用
To extract the first capitalized word with
sub
, you can use
sub(".*\\b([A-Z].*?)\\b.*", "\\1", x)
#[1] "Date" "Number" "Day" "Visiting"
其中\\b
代表单词边界.
您还可以使用一个sub
命令提取所有单词,但是请注意,由于sub
返回的向量的长度与输入向量x
.
You can also extract all word with one sub
command, but note that you have to apply an extra step because the length of the vector returned by sub
is identical to the length of the input vector x
.
以下正则表达式使用前瞻((?=.*-)
)来测试字符串中是否存在-
.在这种情况下,将提取两个单词.如果不存在,则应用逻辑或(|
)之后的正则表达式,并仅返回第一个大写单词.
The following regular expression makes use of a lookahead ((?=.*-)
) to test if there's a -
in the string. If it is the case, two words are extracted. If it's not present, the regular expression after the logical or (|
) is applied and returns the first capitalized word only.
res <- sub("(?:(?=.*-).*\\b([A-Z].*?\\b ).*\\b(.+)$)|(?:.*\\b([A-Z].*?)\\b.*)",
"\\1\\2\\3", x, perl = TRUE)
# [1] "Date" "Number" "Day" "Visiting league"
为了将同一字符串中的多个单词分开,还需要执行一个附加步骤:
One additional step is necessary in order to separate multiple words in the same string:
unlist(strsplit(res, " ", fixed = TRUE))
# [1] "Date" "Number" "Day" "Visiting" "league"
这篇关于“相反"的正则表达式结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!