如何使用pyparsing解析缩进和缩进? [英] How do I parse indents and dedents with pyparsing?

查看:127
本文介绍了如何使用pyparsing解析缩进和缩进?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是Python语法的子集:

Here is a subset of the Python grammar:

single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE

stmt: simple_stmt | compound_stmt
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE

small_stmt: pass_stmt
pass_stmt: 'pass'

compound_stmt: if_stmt
if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]

suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT

(您可以在Python SVN存储库中阅读完整的语法: http://svn.python.org/.../语法)

(You can read the full grammar in the Python SVN repository: http://svn.python.org/.../Grammar)

我正在尝试使用此语法在Python中为Python生成解析器.我遇到的麻烦是如何将INDENTDEDENT标记表示为pyparsing对象.

I am trying to use this grammar to generate a parser for Python, in Python. What I am having trouble with is how to express the INDENT and DEDENT tokens as pyparsing objects.

这是我实现其他终端的方式:

Here is how I have implemented the other terminals:

import pyparsing as p

string_start = (p.Literal('"""') | "'''" | '"' | "'")
string_token = ('\\' + p.CharsNotIn("",exact=1) | p.CharsNotIn('\\',exact=1))
string_end = p.matchPreviousExpr(string_start)

terminals = {
    'NEWLINE': p.Literal('\n').setWhitespaceChars(' \t')
        .setName('NEWLINE').setParseAction(terminal_action('NEWLINE')),
    'ENDMARKER': p.stringEnd.copy().setWhitespaceChars(' \t')
        .setName('ENDMARKER').setParseAction(terminal_action('ENDMARKER')),
    'NAME': (p.Word(p.alphas + "_", p.alphanums + "_", asKeyword=True))
        .setName('NAME').setParseAction(terminal_action('NAME')),
    'NUMBER': p.Combine(
            p.Word(p.nums) + p.CaselessLiteral("l") |
            (p.Word(p.nums) + p.Optional("." + p.Optional(p.Word(p.nums))) | "." + p.Word(p.nums)) +
                p.Optional(p.CaselessLiteral("e") + p.Optional(p.Literal("+") | "-") + p.Word(p.nums)) +
                p.Optional(p.CaselessLiteral("j"))
        ).setName('NUMBER').setParseAction(terminal_action('NUMBER')),
    'STRING': p.Combine(
            p.Optional(p.CaselessLiteral('u')) +
            p.Optional(p.CaselessLiteral('r')) +
            string_start + p.ZeroOrMore(~string_end + string_token) + string_end
        ).setName('STRING').setParseAction(terminal_action('STRING')),

    # I can't find a good way of parsing indents/dedents.
    # The Grammar just has the tokens NEWLINE, INDENT and DEDENT scattered accross the rules.
    # A single NEWLINE would be translated to NEWLINE + PEER (from pyparsing.indentedBlock()), unless followed by INDENT or DEDENT
    # That NEWLINE and IN/DEDENT could be spit across rule boundaries. (see the 'suite' rule)
    'INDENT': (p.LineStart() + p.Optional(p.Word(' '))).setName('INDENT'),
    'DEDENT': (p.LineStart() + p.Optional(p.Word(' '))).setName('DEDENT')
}

terminal_action是一个根据其参数返回相应解析动作的函数.

terminal_action is a function that returns the corresponding parsing action, depending on its arguments.

我知道pyparsing.indentedBlock辅助函数,但是我不知道如何在没有PEER标记的情况下将其用于语法.

I am aware of the pyparsing.indentedBlock helper function, but I am can't figure out how to adopt that to a grammar without the PEER token.

(请查看 pyparsing源代码看看我在说什么)

(Look at the pyparsing souce code to see what I am talking about)

您可以在此处查看我的完整源代码: http://pastebin.ca/1609860

You can see my full source code here: http://pastebin.ca/1609860

推荐答案

pyparsing Wiki上有几个示例示例页面可以为您提供一些见解:

There are a couple of examples on the pyparsing wiki Examples page that could give you some insights:

  • pythonGrammarParser.py
  • indentedGrammarExample.py

要使用pyparsing的indentedBlock,我想您应该将suite定义为:

To use pyparsing's indentedBlock, I think you would define suite as:

indentstack = [1]
suite = indentedBlock(stmt, indentstack, True)

请注意,indentedGrammarExample.py早于在pyparsing中包含indentedBlock,因此它自己的缩进解析实现也是如此.

Note that indentedGrammarExample.py pre-dates the inclusion of indentedBlock in pyparsing, so does its own implemention of indent parsing.

这篇关于如何使用pyparsing解析缩进和缩进?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆