我的语法或解析器生成工具中是否有错误? [英] Do I have a bug in my grammar, or the parser-generation tool?

查看:86
本文介绍了我的语法或解析器生成工具中是否有错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是EBNF格式(主要-实际语法已记录在此处)语法,我正在尝试为以下语法生成解析器:

The following is an EBNF-format (mostly - the actual syntax is documented here) grammar that I am attempting to generate a parser for:

expr = lambda_expr_list $;

lambda_expr_list = [ lambda_expr_list "," ] lambda_expr;

lambda_expr = conditional_expr [ "->" lambda_expr ];

conditional_expr = boolean_or_expr [ "if" conditional_expr "else" conditional_expr ];

boolean_or_expr = [ boolean_or_expr "or" ] boolean_xor_expr;

boolean_xor_expr = [ boolean_xor_expr "xor" ] boolean_and_expr;

boolean_and_expr = [ boolean_and_expr "and" ] boolean_not_expr;

boolean_not_expr = [ "not" ] relation;

relation = [ relation ( "=="
                      | "!="
                      | ">"
                      | "<="
                      | "<"
                      | ">="
                      | [ "not" ] "in"
                      | "is" [ "not" ] ) ] bitwise_or_expr;

bitwise_or_expr = [ bitwise_or_expr "|" ] bitwise_xor_expr;

bitwise_xor_expr = [ bitwise_xor_expr "^" ] bitwise_and_expr;

bitwise_and_expr = [ bitwise_and_expr "&" ] bitwise_shift_expr;

bitwise_shift_expr = [ bitwise_shift_expr ( "<<"
                                          | ">>" ) ] subtraction_expr;

subtraction_expr = [ subtraction_expr "-" ] addition_expr;

addition_expr = [ addition_expr "+" ] division_expr;

division_expr = [ division_expr ( "/"
                                | "\\" ) ] multiplication_expr;

multiplication_expr = [ multiplication_expr ( "*"
                                            | "%" ) ] negative_expr;

negative_expr = [ "-" ] positive_expr;

positive_expr = [ "+" ] bitwise_not_expr;

bitwise_not_expr = [ "~" ] power_expr;

power_expr = slice_expr [ "**" power_expr ];

slice_expr = member_access_expr { subscript };

subscript = "[" slice_defn_list "]";

slice_defn_list = [ slice_defn_list "," ] slice_defn;

slice_defn = lambda_expr
           | [ lambda_expr ] ":" [ [ lambda_expr ] ":" [ lambda_expr ] ];

member_access_expr = [ member_access_expr "." ] function_call_expr;

function_call_expr = atom { parameter_list };

parameter_list = "(" [ lambda_expr_list ] ")";

atom = identifier
     | scalar_literal
     | nary_literal;

identifier = /[_A-Za-z][_A-Za-z0-9]*/;

scalar_literal = float_literal
               | integer_literal
               | boolean_literal;

float_literal = point_float_literal
              | exponent_float_literal;

point_float_literal = /[0-9]+?\.[0-9]+|[0-9]+\./;

exponent_float_literal = /([0-9]+|[0-9]+?\.[0-9]+|[0-9]+\.)[eE][+-]?[0-9]+/;

integer_literal = dec_integer_literal
                | oct_integer_literal
                | hex_integer_literal
                | bin_integer_literal;

dec_integer_literal = /[1-9][0-9]*|0+/;

oct_integer_literal = /0[oO][0-7]+/;

hex_integer_literal = /0[xX][0-9a-fA-F]+/;

bin_integer_literal = /0[bB][01]+/;

boolean_literal = "true"
                | "false";

nary_literal = tuple_literal
             | list_literal
             | dict_literal
             | string_literal
             | byte_string_literal;

tuple_literal = "(" [ lambda_expr_list ] ")";

list_literal = "[" [ ( lambda_expr_list
                     | list_comprehension ) ] "]";

list_comprehension = lambda_expr "for" lambda_expr_list "in" lambda_expr [ "if" lambda_expr ];

dict_literal = "{" [ ( dict_element_list
                     | dict_comprehension ) ] "}";

dict_element_list = [ dict_element_list "," ] dict_element;

dict_element = lambda_expr ":" lambda_expr;

dict_comprehension = dict_element "for" lambda_expr_list "in" lambda_expr [ "if" lambda_expr ];

string_literal = /[uU]?[rR]?(\u0027(\\.|[^\\\r\n\u0027])*\u0027|\u0022(\\.|[^\\\r\n\u0022])*\u0022)/;

byte_string_literal = /[bB][rR]?(\u0027(\\[\u0000-\u007F]|[\u0000-\u0009\u000B-\u000C\u000E-\u0026\u0028-\u005B\u005D-\u007F])*\u0027|\u0022(\\[\u0000-\u007F]|[\u0000-\u0009\u000B-\u000C\u000E-\u0021\u0023-\u005B\u005D-\u007F])*\u0022)/;

我用来生成解析器的工具是 Grako ,会生成一个经过修改的Packrat解析器,该解析器声称支持直接和间接的左递归.

The tool I'm using to generate the parser is Grako, which generates a modified Packrat parser that claims to support both direct and indirect left recursion.

当我在此字符串上运行生成的解析器时:

When I run the generated parser on this string:

input.filter(e -> e[0] in ['t', 'T']).map(e -> (e.len().str(), e)).map(e -> '(Line length: ' + e[0] + ') ' + e[1]).list()

我收到以下错误:

grako.exceptions.FailedParse: (1:13) Expecting end of text. :
input.filter(e -> e[0] in ['t', 'T']).map(e -> (e.len().str(), e)).map(e -> '(Line length: ' + e[0] + ') ' + e[1]).list()
            ^
expr

调试已表明,解析器似乎到达了第一个e[0]的末尾,然后再也没有回溯到/到达尝试与in令牌匹配的点.

Debugging has shown that the parser seems to get to the end of the first e[0], then never backtracks to/reaches a point where it will try to match the in token.

我的语法是否存在问题,以至于左递归支持的Packrat解析器将在此问题上失败?还是应该在Grako问题追踪器上提交问题?

Is there some issue with my grammar such that a left recursion-supporting Packrat parser would fail on it? Or should I file an issue on the Grako issue tracker?

推荐答案

这可能是语法错误,但错误消息并未告诉您它实际发生的位置.完成语法后,我总是做的是在整个元素中嵌入 cut (~)元素(在 if 等关键字,运算符,右括号之后,在所有看起来合理的地方) .

It may be a bug in the grammar, but the error message is not telling you where it actually happens. What I always do after finishing a grammar is to embed cut (~) elements throughout it (after keywords like if, operators, opening parenthesis, everywhere it seems reasonable).

cut 元素使Grako生成的解析器提交到解析树中最接近的选项中采用的选项.这样,它将使解析器在实际上无法解析的表达式上报告失败,而不是使解析器在 if 的开始处失败.

The cut element makes the Grako-generated parser commit to the option taken in the closest choice in the parse tree. That way, instead of having the parser fail at the start on an if, it will report failure at the expression it actually couldn't parse.

语法中的一些错误很难发现,为此,我只是通过解析跟踪来找出解析器在输入中走了多远,以及为什么解析器无法继续下去.

Some bugs in grammars are difficult to spot, and for that I just go through the parse trace to find out how far in the input the parser went, and why it decided it couldn't go further.

我不会在PEG解析器上使用左递归进行专业工作,尽管对于简单,学术性的工作可能会很好.

I will not use left-recursion on a PEG parser for professional work, though it may be fine for simpler, academic work.

boolean_or_expr = boolean_xor_expr {"or" boolean_xor_expr};

然后可以通过语义动作来处理关联性.

The associativity can then be handled in a semantic action.

另请参阅 issue 49 下的讨论格拉科.它说,用于支持左递归的算法不会总是在最终的AST中产生预期的关联性.

Also see the discussion under issue 49 against Grako. It says that the algorithm used to support left recursion will not always produce the expected associativity in the resulting AST.

这篇关于我的语法或解析器生成工具中是否有错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆