使用 PyParsing 解析带有重要换行符的语言(如 Python) [英] Using PyParsing to parse language with signficant newlines (like Python)

查看:80
本文介绍了使用 PyParsing 解析带有重要换行符的语言(如 Python)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在实现一种换行符很重要的语言,有时就像在 Python 中一样,具有完全相同的规则.

I am implementing a language where the newlines are significant, sometime, as in Python, with exactly the same rules.

就我的问题而言,我们可以采用与赋值、括号以及换行符和分号处理有关的 Python 片段.

For the purpose of my question we can take the Python fragment that has to do with assignments, parentheses, and the treatment of newlines and semicolons.

例如,可以这样写:

a = 1 + 2 + 3    # ok
b = c

但不是

a = 1 + 2 + 3     b = c   # incorrect

因为需要一个换行符来分隔两个语句.

because one needs a newline to divide the two statements.

但是我们可以有

a = 1 + 2 + 3;     b = c   # ok

使用分号.

也不允许有

a = 1 + 2 +   # incorrect
3
b = c

因为语句中不能有换行符.

because there cannot be line breaks in a statement.

但是,可以有

a = 1 + 2 + (     # ok
3)
b = c

a = 1 + 2 + \     # ok
3
b = c

我一直在尝试实施上述规则,但我被卡住了.

I have been trying to implement the rules above but I'm stuck.

首先,我使用

ParserElement.setDefaultWhitespaceChars(' \t')

所以现在 \n 很重要.

我设法使用

lines = ZeroOrMore(line + OneOrMore(LineEnd()))

它的一个变体也允许使用 ; 作为分隔符.(我不能完全处理继续括号 \.)

A variation of this allows to have ; as separator as well. (I cannot quite deal with the continuation bracket \.)

我使用infixNotation来定义+-/*.

我坚持的部分是在括号内应该忽略换行符,就像在这种情况下一样:

The part that I am stuck with is that newlines should be ignored inside the parantheses, like in this case:

a = 1 + 2 + ( 
3 +
1)

我认为在这里可以发挥作用的东西是在 infixNotation 生成的括号表达式 (LPAR + term + RPAR) 上使用 setWhitespaceChars 但是,这不起作用因为空白字符不会被较低的表达式继承.

I think here something that can play a role is using setWhitespaceChars on the parentheses expression (LPAR + term + RPAR) that infixNotation generates, however, that does not work because the whitespace characters are not inherited by the lower expressions.

有人有任何提示吗?

我的问题也可以表达为我如何使用 pyParsing 解析 Python 的(片段)?".我以为我可以找到一些示例项目,但我没有.谷歌搜索,我看到人们参考 pyParsing 存储库中的示例,但是 parsePythonValue.py 是关于解析值(我已经可以做到)而不是处理重要的换行符,以及 pythongGrammarParsing.py 是为 Python 解析 BNF 语法,而不是解析 Python.

My question can also be expressed as "how do I parse (a fragment of) Python with pyParsing?". I thought I could find some example project, but I didn't. Googling, I have seen people refer to the examples in the pyParsing repo, however parsePythonValue.py is about parsing values (which I can do already) and not dealing with significant newlines, and pythongGrammarParsing.py is about parsing the BNF grammar for Python, not parsing Python.

推荐答案

注意:这不是一个可行的解决方案(至少现在不是).它依赖于对 Pyparsing 的未发布更改,这些更改甚至还没有通过所有单元测试.我发布它只是为了描述解决方案的可能方法.

NOTE: THIS IS NOT A WORKING SOLUTION (at least not yet). IT RELIES ON UNRELEASED CHANGES TO PYPARSING, WHICH DON'T EVEN PASS ALL UNIT TESTS YET. I AM POSTING IT JUST AS A WAY TO DESCRIBE A POSSIBLE APPROACH TO A SOLUTION.

哎呀!这比我想象的要困难得多.为了实现,我使用了 pyparsing 的忽略机制,将解析操作附加到 lparrpar 表达式来忽略 的括号内,但不是在外面.这还需要通过调用 expr.ignore(None) 添加清除 ignoreExprs 列表的功能.以下是您的代码的外观:

Ooof! This was a lot more difficult than I thought it should be. To implement, I used pyparsing's ignore mechanism with parse actions attached to the lpar and rpar expressions to ignore <NL>'s inside parens, but not outside. This also required adding the ability to clear the ignoreExprs list by calling expr.ignore(None). Here is how your code might look:

import pyparsing as pp

# works with and without packrat
pp.ParserElement.enablePackrat()

pp.ParserElement.setDefaultWhitespaceChars(' \t')

operand = pp.Word(pp.nums)
var = pp.Word(pp.alphas)

arith_expr = pp.Forward()
arith_expr.ignore(pp.pythonStyleComment)
lpar = pp.Suppress("(")
rpar = pp.Suppress(")")

# code to implement selective ignore of NL's inside ()'s
NL = pp.Suppress("\n")
base_ignore = arith_expr.ignoreExprs[:]
ignore_stack = base_ignore[:]
def lpar_pa():
    ignore_stack.append(NL)
    arith_expr.ignore(NL)
    #~ print('post-push', arith_expr.ignoreExprs)
def rpar_pa():
    ignore_stack.pop(-1)
    arith_expr.ignore(None)
    for e in ignore_stack:
        arith_expr.ignore(e)
    #~ print('post-pop', arith_expr.ignoreExprs)
def reset_stack(*args):
    arith_expr.ignore(None)
    for e in base_ignore:
        arith_expr.ignore(e)
    #~ print('post-reset', arith_expr.ignoreExprs)
lpar.addParseAction(lpar_pa)
rpar.addParseAction(rpar_pa)
arith_expr.setFailAction(reset_stack)
arith_expr.addParseAction(reset_stack)

# now define the infix notation as usual
arith_expr <<= pp.infixNotation(operand | var,
    [
    ("-", 1, pp.opAssoc.RIGHT),
    (pp.oneOf("* /"), 2, pp.opAssoc.LEFT),
    (pp.oneOf("- +"), 2, pp.opAssoc.LEFT),
    ],
    lpar=lpar, rpar=rpar
    )

assignment = var + '=' + arith_expr

# Try it out!
assignment.runTests([
"""a = 1 + 3""",
"""a = (1 + 3)""",
"""a = 1 + 2 + ( 
3 +
1)""",
"""a = 1 + 2 + (( 
3 +
1))""",
"""a = 1 + 2 +   
3""",
], fullDump=False)

打印:

a = 1 + 3
['a', '=', ['1', '+', '3']]
a = (1 + 3)
['a', '=', ['1', '+', '3']]
a = 1 + 2 + ( 
3 +
1)
['a', '=', ['1', '+', '2', '+', ['3', '+', '1']]]
a = 1 + 2 + (( 
3 +
1))
['a', '=', ['1', '+', '2', '+', ['3', '+', '1']]]
a = 1 + 2 +   
3
a = 1 + 2 +   
          ^
FAIL: Expected end of text, found '+'  (at char 10), (line:1, col:11)>Exit code: 0

所以这不是不可能的,但确实需要一些英勇的努力.

So it is not out of the realm of possibility, but it does take some heroic efforts.

这篇关于使用 PyParsing 解析带有重要换行符的语言(如 Python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆