匹配方括号内的内容,包括嵌套的方括号 [英] Match contents within square brackets, including nested square brackets
问题描述
我正在尝试编写一个剧透识别系统,以便字符串中的任何破坏者都被指定的扰流角色替换。
I am attempting to write a spoiler identification system so that any spoilers in a string are replaced with a specified spoiler character.
我想匹配一个包围的字符串方括号,方括号内的内容是捕获组1,包括周围括号的整个字符串是匹配。
I want to match a string surrounded by square brackets, such that the contents within the square brackets is capture group 1, and the whole string including the surrounding brackets is the match.
我目前正在使用 \ [(。*?] *)\]
,对此答案中的表达式略有修改这里,因为我还希望嵌套的方括号成为捕获组1的一部分。
I am currently using \[(.*?]*)\]
, a slight modification of the expression found in this answer here, as I also want nested square brackets to be a part of capture group 1.
该表达式的问题在于,尽管它有效且匹配以下内容:
The problem with that expression is that, although it works and matches the following:
-
Jim吃了一个[三明治]
匹配[三明治]
与三明治
作为群组1 -
吉姆吃了一个[泡菜三明治]离子]]
匹配[夹心与[泡菜和洋葱]]
与三明治配[泡菜和洋葱]
作为第1组 -
[[[[]
匹配[[[[]
[[[
as group 1 -
[]]]]
匹配[]]]]
与]]]
作为第1组
Jim ate a [sandwich]
matches[sandwich]
withsandwich
as group 1Jim ate a [sandwich with [pickles and onions]]
matches[sandwich with [pickles and onions]]
withsandwich with [pickles and onions]
as group 1[[[[]
matches[[[[]
with[[[
as group 1[]]]]
matches[]]]]
with]]]
as group 1
但是,如果我想匹配以下内容,它将无法正常工作:
However, if I want to match the following, it does not work as expected:
-
吉姆吃了[三明治配[泡菜]和[洋葱]]
匹配两者:
-
[三明治含[泡菜]
含三明治含[泡菜
作为第1组 -
[洋葱]]
与洋葱]
作为第1组
Jim ate a [sandwich with [pickles] and [onions]]
matches both:[sandwich with [pickles]
withsandwich with [pickles
as group 1[onions]]
withonions]
as group 1
我应该使用什么表达式,使其匹配
[三明治配[泡菜]和[洋葱] ]
三明治含[泡菜]和[洋葱]
作为第1组?What expression should I use such that it matches
[sandwich with [pickles] and [onions]]
withsandwich with [pickles] and [onions]
as group 1?编辑:
因为它使用正则表达式在Java中无法实现这一目标,是否有替代解决方案?
As it seems impossible to achieve this in Java using regex, is there an alternative solution?
编辑2 :
我还希望能够通过找到的每个匹配来拆分字符串,因此由于
String.split(regex)$ c,正常表达式的替代方法将更难实现$ c>方便。这是一个例子:
I also want to be able to split the string by each match found, so an alternative to regular expressions would be harder to implement due to
String.split(regex)
being convenient. Here's an example:-
吉姆吃[泡菜]和[干[洋葱]] <吃三明治/ code>匹配全部:
-
[三明治]
带三明治
作为第1组 -
[泡菜]
含泡菜
作为第1组 -
[干[洋葱]]
带干[洋葱]
作为第1组
Jim ate a [sandwich] with [pickles] and [dried [onions]]
matches all:[sandwich]
withsandwich
as group 1[pickles]
withpickles
as group 1[dried [onions]]
withdried [onions]
as group 1
分句应如下所示:
Jim ate a with and
推荐答案
更直接的解决方案
此解决方案将省略空或仅空白的子字符串
More direct solution
This solution will omit empty or whitespace only substrings
public static List<String> getStrsBetweenBalancedSubstrings(String s, Character markStart, Character markEnd) { List<String> subTreeList = new ArrayList<String>(); int level = 0; int lastCloseBracket= 0; for (int i = 0; i < s.length(); i++) { char c = s.charAt(i); if (c == markStart) { level++; if (level == 1 && i != 0 && i!=lastCloseBracket && !s.substring(lastCloseBracket, i).trim().isEmpty()) { subTreeList.add(s.substring(lastCloseBracket, i).trim()); } } } else if (c == markEnd) { if (level > 0) { level--; lastCloseBracket = i+1; } } } if (lastCloseBracket != s.length() && !s.substring(lastCloseBracket).trim().isEmpty()) { subTreeList.add(s.substring(lastCloseBracket).trim()); } return subTreeList; }
然后,将其用作
String input = "Jim ate a [sandwich][ooh] with [pickles] and [dried [onions]] and ] [an[other] match] and more here"; List<String> between_balanced = getStrsBetweenBalancedSubstrings(input, '[', ']'); System.out.println("Result: " + between_balanced); // => Result: [Jim ate a, with, and, and ], and more here]
原始答案(更复杂,显示了一种提取嵌套括号的方法)
您还可以提取平衡括号内的所有子串,然后用它们拆分:
Original answer (more complex, shows a way to extract nested parentheses)
You can also extract all substrings inside balanced parentheses and then split with them:
String input = "Jim ate a [sandwich] with [pickles] and [dried [onions]] and ] [an[other] match]"; List<String> balanced = getBalancedSubstrings(input, '[', ']', true); System.out.println("Balanced ones: " + balanced); List<String> rx_split = new ArrayList<String>(); for (String item : balanced) { rx_split.add("\\s*" + Pattern.quote(item) + "\\s*"); } String rx = String.join("|", rx_split); System.out.println("In-betweens: " + Arrays.toString(input.split(rx)));
这个函数会找到所有
[]
-balanced substrings:And this function will find all
[]
-balanced substrings:public static List<String> getBalancedSubstrings(String s, Character markStart, Character markEnd, Boolean includeMarkers) { List<String> subTreeList = new ArrayList<String>(); int level = 0; int lastOpenBracket = -1; for (int i = 0; i < s.length(); i++) { char c = s.charAt(i); if (c == markStart) { level++; if (level == 1) { lastOpenBracket = (includeMarkers ? i : i + 1); } } else if (c == markEnd) { if (level == 1) { subTreeList.add(s.substring(lastOpenBracket, (includeMarkers ? i + 1 : i))); } if (level > 0) level--; } } return subTreeList; }
参见 IDEONE演示
代码执行结果:
Balanced ones: ['[sandwich], [pickles], [dried [onions]]', '[an[other] match]'] In-betweens: ['Jim ate a', 'with', 'and', 'and ]']
致谢:
getBalancedSubstrings
基于 peter.murray.rust 回答 如何在Java正则表达式中拆分这个树状字符串?帖子 。Credits: the
getBalancedSubstrings
is based on the peter.murray.rust's answer for How to split this "Tree-like" string in Java regex? post.这篇关于匹配方括号内的内容,包括嵌套的方括号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
-
-