如何拆分这个“树状"Java正则表达式中的字符串? [英] How to split this "Tree-like" string in Java regex?
问题描述
这是字符串:
String str = "(S(B1)(B2(B21)(B22)(B23))(B3)())";
son-() 中的内容可能是 "",或者只是 str 的值,或者类似的模式,递归地,所以 sub-() 是一个子树.
Content in a son-() may be "", or just the value of str, or like that pattern, recursively, so a sub-() is a sub-tree.
预期结果:
str1 is "(S(B1))"
str2 is "(B2(B21)(B22)(B23))" //don't expand sons of a son
str3 is "(B3)"
str4 is "()"
str1-4 是例如数组中的元素
str1-4 are e.g. elements in an Array
如何拆分字符串?
我有一个熟悉的问题:如何在 Java 中拆分此字符串正则表达式?但它的答案对于这个还不够好.
I have a fimiliar question: How to split this string in Java regex? But its answer is not good enough for this one.
推荐答案
正则表达式没有足够的能力来解析平衡/嵌套括号.这与解析标记语言(如 HTML)本质上是相同的问题,其中一致的建议是使用特殊解析器,而不是正则表达式.
Regexes do not have sufficient power to parse balanced/nested brackets. This is essentially the same problem as parsing markup languages such as HTML where the consistent advice is to use special parsers, not regexes.
您应该将其解析为一棵树.总体而言:
You should parse this as a tree. In overall terms:
- 创建堆栈.
- 当您点击("时,将下一个块推入堆栈.
- 当你点击)"时弹出堆栈.
这需要几分钟的时间来编写,并会检查您的输入是否格式正确.
This takes a few minutes to write and will check that your input is well-formed.
这几乎可以立即为您节省时间.尝试为此管理正则表达式将变得越来越复杂,并且几乎不可避免地会失败.
This will save you time almost immediately. Trying to manage regexes for this will become more and more complex and will almost inevitably break down.
更新:如果您只关心一个级别,那么它可以更简单(未调试):
UPDATE: If you are only concerned with one level then it can be simpler (NOT debugged):
List<String> subTreeList = new ArrayList<String>();
String s = getMyString();
int level = 0;
int lastOpenBracket = -1
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (c == '(') {
level++;
if (level == 1) {
lastOpenBracket = i;
}
} else if (c == ')') {
if (level == 1) {
subStreeList.add(s.substring(lastOpenBracket, i);
}
level--;
}
}
I haven't checked it works, and you should debug it. You should also put checks to make sure you
末尾没有挂括号或级别 == 1 的奇怪字符;
don't have hanging brackets at the end or strange characters at level == 1;
这篇关于如何拆分这个“树状"Java正则表达式中的字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!