正则表达式:仅包含非重复单词的匹配字符串 [英] Regular Expression :match string containing only non repeating words
问题描述
我有这种情况(Java代码):1)字符串,例如:"A wild adventure"应匹配.2)带有重复的相邻单词的字符串:野性冒险"不应该匹配.
I have this situation(Java code): 1) a string such as : "A wild adventure" should match. 2) a string with adjacent repeated words: "A wild wild adventure" shouldn't match.
使用此正则表达式:.* \ b(\ w +)\ b \ s * \ 1 \ b.*我可以匹配包含相邻重复单词的字符串.
With this regular expression: .* \b(\w+)\b\s*\1\b.* i can match strings containing adjacent repeated words.
如何扭转这种情况,即如何匹配不包含相邻重复单词的字符串
How to reverse the situation i.e how to match strings which do not contain adjacent repeat words
推荐答案
使用否定的超前断言,(?! pattern)
.
Use negative lookahead assertion, (?!pattern)
.
String[] tests = {
"A wild adventure", // true
"A wild wild adventure" // false
};
for (String test : tests) {
System.out.println(test.matches("(?!.*\\b(\\w+)\\s\\1\\b).*"));
}
由里克·梅瑟姆(Rick Measham)的 explain.pl
:
REGEX: (?!.*\b(\w+)\s\1\b).*
NODE EXPLANATION
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1
or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
\1 what was matched by capture \1
--------------------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
另请参见
- regular-expressions.info/环顾
- 在Java中使用正则表达式
- 使用前瞻性否定符号来确保字符串中没有出现多次的字符
- 使用断言的许多示例
- 使用环顾四周的非常有启发性的示例
否定断言仅在您还想肯定匹配其他模式时才有意义(请参见上面的示例).否则,您可以使用布尔补码运算符
!
来以您之前使用的任何模式对matches
求反.Negative assertions only make sense when there are also other patterns that you want to positively match (see examples above). Otherwise, you can just use boolean complement operator
!
to negatematches
with whatever pattern you were using before.String[] tests = { "A wild adventure", // true "A wild wild adventure" // false }; for (String test : tests) { System.out.println(!test.matches(".*\\b(\\w+)\\s\\1\\b.*")); }
这篇关于正则表达式:仅包含非重复单词的匹配字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!