匹配方括号内的内容,包括嵌套的方括号 [英] Match contents within square brackets, including nested square brackets

查看:124
本文介绍了匹配方括号内的内容,包括嵌套的方括号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个剧透识别系统,以便字符串中的任何破坏者都被指定的扰流角色替换。

I am attempting to write a spoiler identification system so that any spoilers in a string are replaced with a specified spoiler character.

我想匹配一个包围的字符串方括号,方括号内的内容是捕获组1,包括周围括号的整个字符串是匹配。

I want to match a string surrounded by square brackets, such that the contents within the square brackets is capture group 1, and the whole string including the surrounding brackets is the match.

我目前正在使用 \ [(。*?] *)\] ,对此答案中的表达式略有修改这里,因为我还希望嵌套的方括号成为捕获组1的一部分。

I am currently using \[(.*?]*)\], a slight modification of the expression found in this answer here, as I also want nested square brackets to be a part of capture group 1.

该表达式的问题在于,尽管它有效且匹配以下内容:

The problem with that expression is that, although it works and matches the following:


  • Jim吃了一个[三明治] 匹配 [三明治] 三明治作为群组1

  • 吉姆吃了一个[泡菜三明治]离子]] 匹配 [夹心与[泡菜和洋葱]] 三明治配[泡菜和洋葱] 作为第1组

  • [[[[] 匹配 [[[[] [[[ as group 1

  • []]]] 匹配 []]]] ]]] 作为第1组

  • Jim ate a [sandwich] matches [sandwich] with sandwich as group 1
  • Jim ate a [sandwich with [pickles and onions]] matches [sandwich with [pickles and onions]] with sandwich with [pickles and onions] as group 1
  • [[[[] matches [[[[] with [[[ as group 1
  • []]]] matches []]]] with ]]] as group 1

但是,如果我想匹配以下内容,它将无法正常工作:

However, if I want to match the following, it does not work as expected:


  • 吉姆吃了[三明治配[泡菜]和[洋葱]] 匹配两者:


    • [三明治含[泡菜] 三明治含[泡菜作为第1组

    • [洋葱]] 洋葱] 作为第1组

    • Jim ate a [sandwich with [pickles] and [onions]] matches both:
      • [sandwich with [pickles] with sandwich with [pickles as group 1
      • [onions]] with onions] as group 1

      我应该使用什么表达式,使其匹配 [三明治配[泡菜]和[洋葱] ] 三明治含[泡菜]和[洋葱] 作为第1组?

      What expression should I use such that it matches [sandwich with [pickles] and [onions]] with sandwich with [pickles] and [onions] as group 1?

      编辑

      因为它使用正则表达式在Java中无法实现这一目标,是否有替代解决方案?

      As it seems impossible to achieve this in Java using regex, is there an alternative solution?

      编辑2

      我还希望能够通过找到的每个匹配来拆分字符串,因此由于 String.split(regex)方便。这是一个例子:

      I also want to be able to split the string by each match found, so an alternative to regular expressions would be harder to implement due to String.split(regex) being convenient. Here's an example:


      • 吉姆吃[泡菜]和[干[洋葱]] <吃三明治/ code>匹配全部:


        • [三明治] 三明治作为第1组

        • [泡菜] 泡菜作为第1组

        • [干[洋葱]] 干[洋葱] 作为第1组

        • Jim ate a [sandwich] with [pickles] and [dried [onions]] matches all:
          • [sandwich] with sandwich as group 1
          • [pickles] with pickles as group 1
          • [dried [onions]] with dried [onions] as group 1

          分句应如下所示:

          Jim ate a
          with
          and
          


          推荐答案

          更直接的解决方案



          此解决方案将省略空或仅空白的子字符串

          More direct solution

          This solution will omit empty or whitespace only substrings

          public static List<String> getStrsBetweenBalancedSubstrings(String s, Character markStart, Character markEnd) {
              List<String> subTreeList = new ArrayList<String>();
              int level = 0;
              int lastCloseBracket= 0;
              for (int i = 0; i < s.length(); i++) {
                  char c = s.charAt(i);
                      if (c == markStart) {
                              level++;
                              if (level == 1 && i != 0 && i!=lastCloseBracket &&
                                  !s.substring(lastCloseBracket, i).trim().isEmpty()) {
                                      subTreeList.add(s.substring(lastCloseBracket, i).trim());
                          }
                      }
                  } else if (c == markEnd) {
                      if (level > 0) { 
                          level--;
                          lastCloseBracket = i+1;
                      }
                      }
              }
              if (lastCloseBracket != s.length() && !s.substring(lastCloseBracket).trim().isEmpty()) {
                  subTreeList.add(s.substring(lastCloseBracket).trim());  
              }
              return subTreeList;
          }
          

          然后,将其用作

          String input = "Jim ate a [sandwich][ooh] with [pickles] and [dried [onions]] and ] [an[other] match] and more here";
          List<String> between_balanced =  getStrsBetweenBalancedSubstrings(input, '[', ']');
          System.out.println("Result: " + between_balanced);
          // => Result: [Jim ate a, with, and, and ], and more here]
          



          原始答案(更复杂,显示了一种提取嵌套括号的方法)



          您还可以提取平衡括号内的所有子串,然后用它们拆分:

          Original answer (more complex, shows a way to extract nested parentheses)

          You can also extract all substrings inside balanced parentheses and then split with them:

          String input = "Jim ate a [sandwich] with [pickles] and [dried [onions]] and ] [an[other] match]";
          List<String> balanced = getBalancedSubstrings(input, '[', ']', true);
          System.out.println("Balanced ones: " + balanced);
          List<String> rx_split = new ArrayList<String>();
          for (String item : balanced) {
              rx_split.add("\\s*" + Pattern.quote(item) + "\\s*");
          }
          String rx = String.join("|", rx_split);
          System.out.println("In-betweens: " + Arrays.toString(input.split(rx)));
          

          这个函数会找到所有 [] -balanced substrings:

          And this function will find all []-balanced substrings:

          public static List<String> getBalancedSubstrings(String s, Character markStart, 
                                               Character markEnd, Boolean includeMarkers) {
              List<String> subTreeList = new ArrayList<String>();
              int level = 0;
              int lastOpenBracket = -1;
              for (int i = 0; i < s.length(); i++) {
                  char c = s.charAt(i);
                  if (c == markStart) {
                      level++;
                      if (level == 1) {
                          lastOpenBracket = (includeMarkers ? i : i + 1);
                      }
                  }
                  else if (c == markEnd) {
                      if (level == 1) {
                          subTreeList.add(s.substring(lastOpenBracket, (includeMarkers ? i + 1 : i)));
                      }
                      if (level > 0) level--;
                  }
              }
              return subTreeList;
          }
          

          参见 IDEONE演示

          代码执行结果:

          Balanced ones: ['[sandwich], [pickles], [dried [onions]]', '[an[other] match]']
          In-betweens: ['Jim ate a', 'with', 'and', 'and ]']
          

          致谢: getBalancedSubstrings 基于 peter.murray.rust 回答 如何在Java正则表达式中拆分这个树状字符串?帖子

          Credits: the getBalancedSubstrings is based on the peter.murray.rust's answer for How to split this "Tree-like" string in Java regex? post.

          这篇关于匹配方括号内的内容,包括嵌套的方括号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆