解析python中的嵌套括号,逐级抓取内容 [英] parsing nested parentheses in python, grab content by level

查看:57
本文介绍了解析python中的嵌套括号,逐级抓取内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

显然这个问题在阅读后经常出现

正则表达式为 & 检测分号终止的 C++while 循环

又想了一会儿,写了一个函数,返回任意数量的嵌套()里面包含的内容

该函数可以很容易地扩展到任何正则表达式对象,请在此处发表您的想法和考虑.

任何重构建议将不胜感激

(注意,我还是 Python 新手,不想弄清楚如何引发异常或其他什么,所以如果函数无法弄清楚发生了什么,我只是让函数返回失败")

编辑功能以考虑评论:

def ParseNestedParen(string, level):"""返回包含在嵌套 () 中的字符串,索引 i = 级别"""CountLeft = len(re.findall("(", string))CountRight = len(re.findall(")", string))如果 CountLeft == CountRight:LeftRightIndex = [x for x in zip([Left.start()+1 for Left in re.finditer('(', string)],reversed([Right.start() for Right in re.finditer(')', string)]))]elif CountLeft >CountRight:返回 ParseNestedParen(string + ')', level)elif CountLeft 

解决方案

你没有明确说明你的函数的规范是什么,但这种行为在我看来是错误的:

<预><代码>>>>ParseNestedParen('(a)(b)(c)', 0)['a)(b)(c']>>>nested_pa​​ren.ParseNestedParen('(a)(b)(c)', 1)['b']>>>nested_pa​​ren.ParseNestedParen('(a)(b)(c)', 2)['']

对您的代码的其他评论:

  • 文档字符串表示生成",但函数返回一个列表,而不是生成器.
  • 既然只返回一个字符串,为什么要在列表中返回它?
  • 什么情况下函数可以返回字符串fail?
  • 反复调用re.findall然后扔掉结果很浪费.
  • 您尝试重新平衡字符串中的括号,但一次只执行一个括号:
<块引用><预><代码>>>>ParseNestedParen(')' * 1000, 1)运行时错误:调用 Python 对象时超出了最大递归深度

正如托米在您链接到的问题中所说的那样,正则表达式确实是错误的工作工具!"

<小时>

解析嵌套表达式的常用方法是使用堆栈,如下所示:

def parenthetic_contents(string):"""以对(级别,内容)的形式生成字符串中带括号的内容."""堆栈 = []对于 i, c in enumerate(string):如果 c == '(':stack.append(i)elif c == ')' 和堆栈:开始 = stack.pop()产量(len(堆栈),字符串[开始+ 1:我])>>>列表(括号内容('(a(b(c)(d)e)(f)g)'))[(2, 'c'), (2, 'd'), (1, 'b(c)(d)e'), (1, 'f'), (0, 'a(b(c))(d)e)(f)g')]

Apparently this problem comes up fairly often, after reading

Regular expression to detect semi-colon terminated C++ for & while loops

and thinking about the problem for a while, i wrote a function to return the content contained inside an arbitrary number of nested ()

The function could easily be extended to any regular expression object, posting here for your thoughts and considerations.

any refactoring advice would be appreciated

(note, i'm new to python still, and didn't feel like figuring out how to raise exceptions or whatever, so i just had the function return 'fail' if it couldin't figure out what was going on)

Edited function to take into account comments:

def ParseNestedParen(string, level):
    """
    Return string contained in nested (), indexing i = level
    """
    CountLeft = len(re.findall("(", string))
    CountRight = len(re.findall(")", string))
    if CountLeft == CountRight:
        LeftRightIndex = [x for x in zip(
        [Left.start()+1 for Left in re.finditer('(', string)], 
        reversed([Right.start() for Right in re.finditer(')', string)]))]

    elif CountLeft > CountRight:
        return ParseNestedParen(string + ')', level)

    elif CountLeft < CountRight:
        return ParseNestedParen('(' + string, level)

    return string[LeftRightIndex[level][0]:LeftRightIndex[level][1]]

解决方案

You don't make it clear exactly what the specification of your function is, but this behaviour seems wrong to me:

>>> ParseNestedParen('(a)(b)(c)', 0)
['a)(b)(c']
>>> nested_paren.ParseNestedParen('(a)(b)(c)', 1)
['b']
>>> nested_paren.ParseNestedParen('(a)(b)(c)', 2)
['']

Other comments on your code:

  • Docstring says "generate", but function returns a list, not a generator.
  • Since only one string is ever returned, why return it in a list?
  • Under what circumstances can the function return the string fail?
  • Repeatedly calling re.findall and then throwing away the result is wasteful.
  • You attempt to rebalance the parentheses in the string, but you do so only one parenthesis at a time:

>>> ParseNestedParen(')' * 1000, 1)
RuntimeError: maximum recursion depth exceeded while calling a Python object

As Thomi said in the question you linked to, "regular expressions really are the wrong tool for the job!"


The usual way to parse nested expressions is to use a stack, along these lines:

def parenthetic_contents(string):
    """Generate parenthesized contents in string as pairs (level, contents)."""
    stack = []
    for i, c in enumerate(string):
        if c == '(':
            stack.append(i)
        elif c == ')' and stack:
            start = stack.pop()
            yield (len(stack), string[start + 1: i])

>>> list(parenthetic_contents('(a(b(c)(d)e)(f)g)'))
[(2, 'c'), (2, 'd'), (1, 'b(c)(d)e'), (1, 'f'), (0, 'a(b(c)(d)e)(f)g')]

这篇关于解析python中的嵌套括号,逐级抓取内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆