在python中解析嵌套括号,按级别抓取内容 [英] parsing nested parentheses in python, grab content by level

查看:877
本文介绍了在python中解析嵌套括号,按级别抓取内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

显然,阅读

将正则表达式表示为检测&的分号终止的C ++ while循环

考虑了一段时间后,我编写了一个函数来返回任意数量的嵌套()中包含的内容

and thinking about the problem for a while, i wrote a function to return the content contained inside an arbitrary number of nested ()

该函数可以轻松扩展到任何正则表达式对象,在此处发表您的想法和考虑.

The function could easily be extended to any regular expression object, posting here for your thoughts and considerations.

任何重构建议将不胜感激

any refactoring advice would be appreciated

(注意,我仍然对python还是陌生的,并且不想弄清楚如何引发异常或其他任何事情,所以如果函数无法弄清楚到底发生了什么,我只是让函数返回"fail" )

(note, i'm new to python still, and didn't feel like figuring out how to raise exceptions or whatever, so i just had the function return 'fail' if it couldin't figure out what was going on)

编辑后的功能可考虑注释:

Edited function to take into account comments:

def ParseNestedParen(string, level):
    """
    Return string contained in nested (), indexing i = level
    """
    CountLeft = len(re.findall("\(", string))
    CountRight = len(re.findall("\)", string))
    if CountLeft == CountRight:
        LeftRightIndex = [x for x in zip(
        [Left.start()+1 for Left in re.finditer('\(', string)], 
        reversed([Right.start() for Right in re.finditer('\)', string)]))]

    elif CountLeft > CountRight:
        return ParseNestedParen(string + ')', level)

    elif CountLeft < CountRight:
        return ParseNestedParen('(' + string, level)

    return string[LeftRightIndex[level][0]:LeftRightIndex[level][1]]

推荐答案

您无法确切说明函数的规格是什么,但是这种行为对我来说似乎是错误的:

You don't make it clear exactly what the specification of your function is, but this behaviour seems wrong to me:

>>> ParseNestedParen('(a)(b)(c)', 0)
['a)(b)(c']
>>> nested_paren.ParseNestedParen('(a)(b)(c)', 1)
['b']
>>> nested_paren.ParseNestedParen('(a)(b)(c)', 2)
['']

您的代码的其他注释:

  • 文档字符串说生成",但是函数返回一个列表,而不是生成器.
  • 既然只返回了一个字符串,为什么要在列表中返回它?
  • 在什么情况下函数可以返回字符串fail?
  • 反复调用re.findall然后丢弃结果是浪费的.
  • 您尝试重新平衡字符串中的括号,但一次只能这样做一个括号:
  • Docstring says "generate", but function returns a list, not a generator.
  • Since only one string is ever returned, why return it in a list?
  • Under what circumstances can the function return the string fail?
  • Repeatedly calling re.findall and then throwing away the result is wasteful.
  • You attempt to rebalance the parentheses in the string, but you do so only one parenthesis at a time:
>>> ParseNestedParen(')' * 1000, 1)
RuntimeError: maximum recursion depth exceeded while calling a Python object

正如Thomi在您链接到的问题中所说的那样,正则表达式确实是这项工作的错误工具!"

As Thomi said in the question you linked to, "regular expressions really are the wrong tool for the job!"

解析嵌套表达式的通常方法是使用堆栈,如下所示:

The usual way to parse nested expressions is to use a stack, along these lines:

def parenthetic_contents(string):
    """Generate parenthesized contents in string as pairs (level, contents)."""
    stack = []
    for i, c in enumerate(string):
        if c == '(':
            stack.append(i)
        elif c == ')' and stack:
            start = stack.pop()
            yield (len(stack), string[start + 1: i])

>>> list(parenthetic_contents('(a(b(c)(d)e)(f)g)'))
[(2, 'c'), (2, 'd'), (1, 'b(c)(d)e'), (1, 'f'), (0, 'a(b(c)(d)e)(f)g')]

这篇关于在python中解析嵌套括号,按级别抓取内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆