如何调试ANTLR4语法无关/不匹配的输入错误 [英] How to debug ANTLR4 grammar extraneous / mismatched input error

查看:28
本文介绍了如何调试ANTLR4语法无关/不匹配的输入错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想解析规则手册demo.rb"文件,如下所示:

I want to parse the Rulebook "demo.rb" files like below:

rulebook Titanic-Normalization {
  version 1

  meta {
    description "Test"
    source "my-rules.xslx"
    user "joltie"
  }

  rule remove-first-line {
    description "Removes first line when offset is zero"
    when(present(offset) && offset == 0) then {
      filter-row-if-true true;
    }
  }
}

我编写了如下所示的 ANTLR4 语法文件 Rulebook.g4.目前,它可以很好地解析*.rb文件,但遇到表达式"/语句"规则时会抛出意外错误.

I wrote the ANTLR4 grammar file Rulebook.g4 like below. For now, it can parse the *.rb files generally well, but it throw unexpected error when encounter the "expression" / "statement" rules.

grammar Rulebook;

rulebookStatement
    :   KWRulebook
        (GeneralIdentifier | Identifier)
        '{'
        KWVersion
        VersionConstant
        metaStatement
        (ruleStatement)+
        '}'
    ;

metaStatement
    :   KWMeta
        '{'
        KWDescription
        StringLiteral
        KWSource
        StringLiteral
        KWUser
        StringLiteral
        '}'
    ;

ruleStatement
    :   KWRule
        (GeneralIdentifier | Identifier)
        '{'
        KWDescription
        StringLiteral
        whenThenStatement
        '}'
    ;

whenThenStatement
    :   KWWhen '(' expression ')'
        KWThen '{' statement '}'
    ;

primaryExpression
    :   GeneralIdentifier
    |   Identifier
    |   StringLiteral+
    |   '(' expression ')'
    ;

postfixExpression
    :   primaryExpression
    |   postfixExpression '[' expression ']'
    |   postfixExpression '(' argumentExpressionList? ')'
    |   postfixExpression '.' Identifier
    |   postfixExpression '->' Identifier
    |   postfixExpression '++'
    |   postfixExpression '--'
    ;

argumentExpressionList
    :   assignmentExpression
    |   argumentExpressionList ',' assignmentExpression
    ;

unaryExpression
    :   postfixExpression
    |   '++' unaryExpression
    |   '--' unaryExpression
    |   unaryOperator castExpression
    ;

unaryOperator
    :   '&' | '*' | '+' | '-' | '~' | '!'
    ;

castExpression
    :   unaryExpression
    |   DigitSequence // for
    ;

multiplicativeExpression
    :   castExpression
    |   multiplicativeExpression '*' castExpression
    |   multiplicativeExpression '/' castExpression
    |   multiplicativeExpression '%' castExpression
    ;

additiveExpression
    :   multiplicativeExpression
    |   additiveExpression '+' multiplicativeExpression
    |   additiveExpression '-' multiplicativeExpression
    ;

shiftExpression
    :   additiveExpression
    |   shiftExpression '<<' additiveExpression
    |   shiftExpression '>>' additiveExpression
    ;

relationalExpression
    :   shiftExpression
    |   relationalExpression '<' shiftExpression
    |   relationalExpression '>' shiftExpression
    |   relationalExpression '<=' shiftExpression
    |   relationalExpression '>=' shiftExpression
    ;

equalityExpression
    :   relationalExpression
    |   equalityExpression '==' relationalExpression
    |   equalityExpression '!=' relationalExpression
    ;

andExpression
    :   equalityExpression
    |   andExpression '&' equalityExpression
    ;

exclusiveOrExpression
    :   andExpression
    |   exclusiveOrExpression '^' andExpression
    ;

inclusiveOrExpression
    :   exclusiveOrExpression
    |   inclusiveOrExpression '|' exclusiveOrExpression
    ;

logicalAndExpression
    :   inclusiveOrExpression
    |   logicalAndExpression '&&' inclusiveOrExpression
    ;

logicalOrExpression
    :   logicalAndExpression
    |   logicalOrExpression '||' logicalAndExpression
    ;

conditionalExpression
    :   logicalOrExpression ('?' expression ':' conditionalExpression)?
    ;

assignmentExpression
    :   conditionalExpression
    |   unaryExpression assignmentOperator assignmentExpression
    |   DigitSequence // for
    ;

assignmentOperator
    :   '=' | '*=' | '/=' | '%=' | '+=' | '-=' | '<<=' | '>>=' | '&=' | '^=' | '|='
    ;

expression
    :   assignmentExpression
    |   expression ',' assignmentExpression
    ;

statement
    :   expressionStatement
    ;

expressionStatement
    :   expression+ ';'
    ;


KWRulebook: 'rulebook';
KWVersion: 'version';
KWMeta: 'meta';
KWDescription: 'description';
KWSource: 'source';
KWUser: 'user';
KWRule: 'rule';
KWWhen: 'when';
KWThen: 'then';
KWTrue: 'true';
KWFalse: 'false';

fragment
LeftParen : '(';

fragment
RightParen : ')';

fragment
LeftBracket : '[';

fragment
RightBracket : ']';

fragment
LeftBrace : '{';

fragment
RightBrace : '}';


Identifier
    :   IdentifierNondigit
        (   IdentifierNondigit
        |   Digit
        )*
    ;

GeneralIdentifier
    :   Identifier
        ('-' Identifier)+
    ;

fragment
IdentifierNondigit
    :   Nondigit
    //|   // other implementation-defined characters...
    ;

VersionConstant
    :   DigitSequence ('.' DigitSequence)*
    ;

DigitSequence
    :   Digit+
    ;

fragment
Nondigit
    :   [a-zA-Z_]
    ;

fragment
Digit
    :   [0-9]
    ;

StringLiteral
    :   '"' SCharSequence? '"'
    |   '\'' SCharSequence? '\''
    ;

fragment
SCharSequence
    :   SChar+
    ;

fragment
SChar
    :   ~["\\\r\n]
    |   '\\\n'   // Added line
    |   '\\\r\n' // Added line
    ;

Whitespace
    :   [ \t]+
        -> skip
    ;

Newline
    :   (   '\r' '\n'?
        |   '\n'
        )
        -> skip
    ;

BlockComment
    :   '/*' .*? '*/'
        -> skip
    ;

LineComment
    :   '//' ~[\r\n]*
        -> skip
    ;

我使用如下单元测试测试了规则手册解析器:

I tested the Rulebook parser with unit test like below:

    public void testScanRulebookFile() throws IOException {
        String fileName = "C:\\rulebooks\\demo.rb";
        FileInputStream fis = new FileInputStream(fileName);
        // create a CharStream that reads from standard input
        CharStream input = CharStreams.fromStream(fis);

        // create a lexer that feeds off of input CharStream
        RulebookLexer lexer = new RulebookLexer(input);

        // create a buffer of tokens pulled from the lexer
        CommonTokenStream tokens = new CommonTokenStream(lexer);

        // create a parser that feeds off the tokens buffer
        RulebookParser parser = new RulebookParser(tokens);


        RulebookStatementContext context = parser.rulebookStatement();
//        WhenThenStatementContext context = parser.whenThenStatement();

        System.out.println(context.toStringTree(parser));

//      ParseTree tree = parser.getContext(); // begin parsing at init rule
//      System.out.println(tree.toStringTree(parser)); // print LISP-style tree
    }

对于上面的demo.rb",解析器得到如下错误.我还将 RulebookStatementContext 打印为 toStringTree.

For the "demo.rb" as above, the parser got the error as below. I also print the RulebookStatementContext as toStringTree.

line 12:25 mismatched input '&&' expecting ')'
(rulebookStatement rulebook Titanic-Normalization { version 1 (metaStatement meta { description "Test" source "my-rules.xslx" user "joltie" }) (ruleStatement rule remove-first-line { description "Removes first line when offset is zero" (whenThenStatement when ( (expression (assignmentExpression (conditionalExpression (logicalOrExpression (logicalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (postfixExpression (primaryExpression present)) ( (argumentExpressionList (assignmentExpression (conditionalExpression (logicalOrExpression (logicalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (primaryExpression offset))))))))))))))))) ))))))))))))))))) && offset == 0 ) then { filter-row-if-true true ;) }) })

我还编写了单元测试来测试短输入上下文,例如 "when (offset == 0) then {\n" + "filter-row-if-true true;\n" + "}\n" 来调试问题.但它仍然得到这样的错误:

I also write the unit test to test short input context like "when (offset == 0) then {\n" + "filter-row-if-true true;\n" + "}\n" to debug the problem. But it still got the error like:

line 1:16 mismatched input '0' expecting {'(', '++', '--', '&&', '&', '*', '+', '-', '~', '!', Identifier, GeneralIdentifier, DigitSequence, StringLiteral}
line 2:19 extraneous input 'true' expecting {'(', '++', '--', '&&', '&', '*', '+', '-', '~', '!', ';', Identifier, GeneralIdentifier, DigitSequence, StringLiteral}

尝试了两天,我没有任何进展.问题就如上,请高人给我一些关于如何调试ANTLR4语法无关/不匹配输入错误的建议.

With two day's tries, I didn't got any progress. The question is so long as above, please someone give me some advises about how to debug ANTLR4 grammar extraneous / mismatched input error.

推荐答案

我不知道是否有更复杂的方法来调试语法/解析器,但我通常这样做:

I don't know if there are any more sophisticated methods to debug a grammar/parser but here's how I usally do it:

  1. 将导致问题的输入减少到尽可能少的字符可能的.

  1. Reduce the input that causes the problem to as few characters as possible.

尽可能减少你的语法,使其仍然在相应的输入上产生相同的错误(大多数情况下,这意味着通过回收原始语法的规则来为减少的输入编写最小语法(简化尽可能)

Reduce your grammar as far as possible so that it still produces the same error on the respective input (most of the time that means wrinting a minimal grammar for the reduced input by recycling the rules of the original grammar (simplifying as far as possible)

确保词法分析器正确地分割输入(因为 ANTLRWorks 中显示词法分析器输出的功能很棒)

Make sure the lexer segments the input properly (for that the feature in ANTLRWorks that shows you the lexer output is great)

看看 ParseTree.ANTLR 的 testRig 具有以图形方式显示 ParseTree 的功能(您可以通过 ANTLRWorks 或 ANTLR 的 TreeViewer 访问此功能),因此您可以查看解析器的解释与您所拥有的解释不同的地方

Have a look at the ParseTree. ANTLR's testRig has a feature that displays the ParseTree graphically (You can access this functionality either through ANTLRWorks or by ANTLR's TreeViewer) so you can have a look where the parser's interpretation differs from the one you have

手动"进行解析.这意味着您将按照自己的语法一步一步地进行输入,并尝试不应用任何逻辑或假设/知识/等.在那个过程中.只需像计算机一样遵循您自己的语法即可.质疑你采取的每一步(是否有另一种方式来匹配输入),并总是尝试以另一种方式来匹配输入,而不是你真正希望它被解析的方式

Do the parsing "by hand". That means you will take your grammar and go through the input by yourself, step by step and try to apply no logic or assumptions/knowledge/etc. during that process. Just follow through your own grammar as a computer would do it. Question every step you take (Is there another way to match the input) and always try to match the input in another way than the one you actually want it to be parsed

尝试修复最小语法中的错误,然后将解决方案迁移到您的实际语法中.

Try to fix the error in the minimal grammar and migrate the solution to your real grammar afterwards.

这篇关于如何调试ANTLR4语法无关/不匹配的输入错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆