正则表达式:仅包含非重复单词的匹配字符串 [英] Regular Expression :match string containing only non repeating words

查看:51
本文介绍了正则表达式:仅包含非重复单词的匹配字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这种情况(Java代码):1)字符串,例如:"A wild adventure"应匹配.2)带有重复的相邻单词的字符串:野性冒险"不应该匹配.

I have this situation(Java code): 1) a string such as : "A wild adventure" should match. 2) a string with adjacent repeated words: "A wild wild adventure" shouldn't match.

使用此正则表达式:.* \ b(\ w +)\ b \ s * \ 1 \ b.*我可以匹配包含相邻重复单词的字符串.

With this regular expression: .* \b(\w+)\b\s*\1\b.* i can match strings containing adjacent repeated words.

如何扭转这种情况,即如何匹配不包含相邻重复单词的字符串

How to reverse the situation i.e how to match strings which do not contain adjacent repeat words

推荐答案

使用否定的超前断言,(?! pattern).

Use negative lookahead assertion, (?!pattern).

    String[] tests = {
        "A wild adventure",      // true
        "A wild wild adventure"  // false
    };
    for (String test : tests) {
        System.out.println(test.matches("(?!.*\\b(\\w+)\\s\\1\\b).*"));
    }

里克·梅瑟姆(Rick Measham)的 explain.pl :

REGEX: (?!.*\b(\w+)\s\1\b).*
NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    \b                       the boundary between a word char (\w)
                             and something that is not a word char
--------------------------------------------------------------------------------
    (                        group and capture to \1:
--------------------------------------------------------------------------------
      \w+                      word characters (a-z, A-Z, 0-9, _) (1
                               or more times (matching the most
                               amount possible))
--------------------------------------------------------------------------------
    )                        end of \1
--------------------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
    \1                       what was matched by capture \1
--------------------------------------------------------------------------------
    \b                       the boundary between a word char (\w)
                             and something that is not a word char
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))

另请参见

  • regular-expressions.info/环顾
    • 在Java中使用正则表达式
      • 使用前瞻性否定符号来确保字符串中没有出现多次的字符
      • 使用断言的许多示例
      • 使用环顾四周的非常有启发性的示例

      否定断言仅在您还想肯定匹配其他模式时才有意义(请参见上面的示例).否则,您可以使用布尔补码运算符来以您之前使用的任何模式对 matches 求反.

      Negative assertions only make sense when there are also other patterns that you want to positively match (see examples above). Otherwise, you can just use boolean complement operator ! to negate matches with whatever pattern you were using before.

      String[] tests = {
          "A wild adventure",      // true
          "A wild wild adventure"  // false
      };
      for (String test : tests) {
          System.out.println(!test.matches(".*\\b(\\w+)\\s\\1\\b.*"));
      }
      

      这篇关于正则表达式:仅包含非重复单词的匹配字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆