如何拆分这个“树状"Java正则表达式中的字符串? [英] How to split this "Tree-like" string in Java regex?

查看:29
本文介绍了如何拆分这个“树状"Java正则表达式中的字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是字符串:

String str = "(S(B1)(B2(B21)(B22)(B23))(B3)())";

son-() 中的内容可能是 "",或者只是 str 的值,或者类似的模式,递归地,所以 sub-() 是一个子树.

Content in a son-() may be "", or just the value of str, or like that pattern, recursively, so a sub-() is a sub-tree.

预期结果:

str1 is "(S(B1))"
str2 is "(B2(B21)(B22)(B23))" //don't expand sons of a son
str3 is "(B3)"
str4 is "()"

str1-4 是例如数组中的元素

str1-4 are e.g. elements in an Array

如何拆分字符串?

我有一个熟悉的问题:如何在 Java 中拆分此字符串正则表达式?但它的答案对于这个还不够好.

I have a fimiliar question: How to split this string in Java regex? But its answer is not good enough for this one.

推荐答案

正则表达式没有足够的能力来解析平衡/嵌套括号.这与解析标记语言(如 HTML)本质上是相同的问题,其中一致的建议是使用特殊解析器,而不是正则表达式.

Regexes do not have sufficient power to parse balanced/nested brackets. This is essentially the same problem as parsing markup languages such as HTML where the consistent advice is to use special parsers, not regexes.

您应该将其解析为一棵树.总体而言:

You should parse this as a tree. In overall terms:

  • 创建堆栈.
  • 当您点击("时,将下一个块推入堆栈.
  • 当你点击)"时弹出堆栈.

这需要几分钟的时间来编写,并会检查您的输入是否格式正确.

This takes a few minutes to write and will check that your input is well-formed.

这几乎可以立即为您节省时间.尝试为此管理正则表达式将变得越来越复杂,并且几乎不可避免地会失败.

This will save you time almost immediately. Trying to manage regexes for this will become more and more complex and will almost inevitably break down.

更新:如果您只关心一个级别,那么它可以更简单(未调试):

UPDATE: If you are only concerned with one level then it can be simpler (NOT debugged):

List<String> subTreeList = new ArrayList<String>();
String s = getMyString();
int level = 0;
int lastOpenBracket = -1
for (int i = 0; i < s.length(); i++) {
    char c = s.charAt(i);
    if (c == '(') {
        level++;
        if (level == 1) {
            lastOpenBracket = i;
        }
    } else if (c == ')') {
        if (level == 1) {
            subStreeList.add(s.substring(lastOpenBracket, i);
        }
        level--;
    }
}

I haven't checked it works, and you should debug it. You should also put checks to make sure you 

末尾没有挂括号或级别 == 1 的奇怪字符;

don't have hanging brackets at the end or strange characters at level == 1;

这篇关于如何拆分这个“树状"Java正则表达式中的字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆