如何调试ANTLR4语法无关/不匹配的输入错误 [英] How to debug ANTLR4 grammar extraneous / mismatched input error

查看:159
本文介绍了如何调试ANTLR4语法无关/不匹配的输入错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要解析规则书的"demo.rb"文件,如下所示:

I want to parse the Rulebook "demo.rb" files like below:

rulebook Titanic-Normalization {
  version 1

  meta {
    description "Test"
    source "my-rules.xslx"
    user "joltie"
  }

  rule remove-first-line {
    description "Removes first line when offset is zero"
    when(present(offset) && offset == 0) then {
      filter-row-if-true true;
    }
  }
}

我写了如下的ANTLR4语法文件Rulebook.g4.目前,它可以很好地解析* .rb文件,但是在遇到表达式"/声明"规则时会引发意外错误.

I wrote the ANTLR4 grammar file Rulebook.g4 like below. For now, it can parse the *.rb files generally well, but it throw unexpected error when encounter the "expression" / "statement" rules.

grammar Rulebook;

rulebookStatement
    :   KWRulebook
        (GeneralIdentifier | Identifier)
        '{'
        KWVersion
        VersionConstant
        metaStatement
        (ruleStatement)+
        '}'
    ;

metaStatement
    :   KWMeta
        '{'
        KWDescription
        StringLiteral
        KWSource
        StringLiteral
        KWUser
        StringLiteral
        '}'
    ;

ruleStatement
    :   KWRule
        (GeneralIdentifier | Identifier)
        '{'
        KWDescription
        StringLiteral
        whenThenStatement
        '}'
    ;

whenThenStatement
    :   KWWhen '(' expression ')'
        KWThen '{' statement '}'
    ;

primaryExpression
    :   GeneralIdentifier
    |   Identifier
    |   StringLiteral+
    |   '(' expression ')'
    ;

postfixExpression
    :   primaryExpression
    |   postfixExpression '[' expression ']'
    |   postfixExpression '(' argumentExpressionList? ')'
    |   postfixExpression '.' Identifier
    |   postfixExpression '->' Identifier
    |   postfixExpression '++'
    |   postfixExpression '--'
    ;

argumentExpressionList
    :   assignmentExpression
    |   argumentExpressionList ',' assignmentExpression
    ;

unaryExpression
    :   postfixExpression
    |   '++' unaryExpression
    |   '--' unaryExpression
    |   unaryOperator castExpression
    ;

unaryOperator
    :   '&' | '*' | '+' | '-' | '~' | '!'
    ;

castExpression
    :   unaryExpression
    |   DigitSequence // for
    ;

multiplicativeExpression
    :   castExpression
    |   multiplicativeExpression '*' castExpression
    |   multiplicativeExpression '/' castExpression
    |   multiplicativeExpression '%' castExpression
    ;

additiveExpression
    :   multiplicativeExpression
    |   additiveExpression '+' multiplicativeExpression
    |   additiveExpression '-' multiplicativeExpression
    ;

shiftExpression
    :   additiveExpression
    |   shiftExpression '<<' additiveExpression
    |   shiftExpression '>>' additiveExpression
    ;

relationalExpression
    :   shiftExpression
    |   relationalExpression '<' shiftExpression
    |   relationalExpression '>' shiftExpression
    |   relationalExpression '<=' shiftExpression
    |   relationalExpression '>=' shiftExpression
    ;

equalityExpression
    :   relationalExpression
    |   equalityExpression '==' relationalExpression
    |   equalityExpression '!=' relationalExpression
    ;

andExpression
    :   equalityExpression
    |   andExpression '&' equalityExpression
    ;

exclusiveOrExpression
    :   andExpression
    |   exclusiveOrExpression '^' andExpression
    ;

inclusiveOrExpression
    :   exclusiveOrExpression
    |   inclusiveOrExpression '|' exclusiveOrExpression
    ;

logicalAndExpression
    :   inclusiveOrExpression
    |   logicalAndExpression '&&' inclusiveOrExpression
    ;

logicalOrExpression
    :   logicalAndExpression
    |   logicalOrExpression '||' logicalAndExpression
    ;

conditionalExpression
    :   logicalOrExpression ('?' expression ':' conditionalExpression)?
    ;

assignmentExpression
    :   conditionalExpression
    |   unaryExpression assignmentOperator assignmentExpression
    |   DigitSequence // for
    ;

assignmentOperator
    :   '=' | '*=' | '/=' | '%=' | '+=' | '-=' | '<<=' | '>>=' | '&=' | '^=' | '|='
    ;

expression
    :   assignmentExpression
    |   expression ',' assignmentExpression
    ;

statement
    :   expressionStatement
    ;

expressionStatement
    :   expression+ ';'
    ;


KWRulebook: 'rulebook';
KWVersion: 'version';
KWMeta: 'meta';
KWDescription: 'description';
KWSource: 'source';
KWUser: 'user';
KWRule: 'rule';
KWWhen: 'when';
KWThen: 'then';
KWTrue: 'true';
KWFalse: 'false';

fragment
LeftParen : '(';

fragment
RightParen : ')';

fragment
LeftBracket : '[';

fragment
RightBracket : ']';

fragment
LeftBrace : '{';

fragment
RightBrace : '}';


Identifier
    :   IdentifierNondigit
        (   IdentifierNondigit
        |   Digit
        )*
    ;

GeneralIdentifier
    :   Identifier
        ('-' Identifier)+
    ;

fragment
IdentifierNondigit
    :   Nondigit
    //|   // other implementation-defined characters...
    ;

VersionConstant
    :   DigitSequence ('.' DigitSequence)*
    ;

DigitSequence
    :   Digit+
    ;

fragment
Nondigit
    :   [a-zA-Z_]
    ;

fragment
Digit
    :   [0-9]
    ;

StringLiteral
    :   '"' SCharSequence? '"'
    |   '\'' SCharSequence? '\''
    ;

fragment
SCharSequence
    :   SChar+
    ;

fragment
SChar
    :   ~["\\\r\n]
    |   '\\\n'   // Added line
    |   '\\\r\n' // Added line
    ;

Whitespace
    :   [ \t]+
        -> skip
    ;

Newline
    :   (   '\r' '\n'?
        |   '\n'
        )
        -> skip
    ;

BlockComment
    :   '/*' .*? '*/'
        -> skip
    ;

LineComment
    :   '//' ~[\r\n]*
        -> skip
    ;

我使用以下单元测试对Rulebook解析器进行了测试:

I tested the Rulebook parser with unit test like below:

    public void testScanRulebookFile() throws IOException {
        String fileName = "C:\\rulebooks\\demo.rb";
        FileInputStream fis = new FileInputStream(fileName);
        // create a CharStream that reads from standard input
        CharStream input = CharStreams.fromStream(fis);

        // create a lexer that feeds off of input CharStream
        RulebookLexer lexer = new RulebookLexer(input);

        // create a buffer of tokens pulled from the lexer
        CommonTokenStream tokens = new CommonTokenStream(lexer);

        // create a parser that feeds off the tokens buffer
        RulebookParser parser = new RulebookParser(tokens);


        RulebookStatementContext context = parser.rulebookStatement();
//        WhenThenStatementContext context = parser.whenThenStatement();

        System.out.println(context.toStringTree(parser));

//      ParseTree tree = parser.getContext(); // begin parsing at init rule
//      System.out.println(tree.toStringTree(parser)); // print LISP-style tree
    }

对于上述"demo.rb",解析器收到以下错误.我还将RulebookStatementContext打印为toStringTree.

For the "demo.rb" as above, the parser got the error as below. I also print the RulebookStatementContext as toStringTree.

line 12:25 mismatched input '&&' expecting ')'
(rulebookStatement rulebook Titanic-Normalization { version 1 (metaStatement meta { description "Test" source "my-rules.xslx" user "joltie" }) (ruleStatement rule remove-first-line { description "Removes first line when offset is zero" (whenThenStatement when ( (expression (assignmentExpression (conditionalExpression (logicalOrExpression (logicalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (postfixExpression (primaryExpression present)) ( (argumentExpressionList (assignmentExpression (conditionalExpression (logicalOrExpression (logicalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (castExpression (unaryExpression (postfixExpression (primaryExpression offset))))))))))))))))) ))))))))))))))))) && offset == 0 ) then { filter-row-if-true true ;) }) })

我还编写了单元测试来测试像"when (offset == 0) then {\n" + "filter-row-if-true true;\n" + "}\n"这样的短输入上下文来调试问题.但是它仍然出现如下错误:

I also write the unit test to test short input context like "when (offset == 0) then {\n" + "filter-row-if-true true;\n" + "}\n" to debug the problem. But it still got the error like:

line 1:16 mismatched input '0' expecting {'(', '++', '--', '&&', '&', '*', '+', '-', '~', '!', Identifier, GeneralIdentifier, DigitSequence, StringLiteral}
line 2:19 extraneous input 'true' expecting {'(', '++', '--', '&&', '&', '*', '+', '-', '~', '!', ';', Identifier, GeneralIdentifier, DigitSequence, StringLiteral}

经过两天的尝试,我没有任何进展.问题只限于上述内容,请有人给我一些有关如何调试ANTLR4语法无关或不匹配的输入错误的建议.

With two day's tries, I didn't got any progress. The question is so long as above, please someone give me some advises about how to debug ANTLR4 grammar extraneous / mismatched input error.

推荐答案

我不知道是否有任何更复杂的方法来调试语法/解析器,但这是我通常的操作方式:

I don't know if there are any more sophisticated methods to debug a grammar/parser but here's how I usally do it:

  1. 将导致问题的输入减少到尽可能少的字符 可能的.

  1. Reduce the input that causes the problem to as few characters as possible.

尽可能地减少语法,以便它仍在相应的输入上产生相同的错误(大多数情况下,这意味着通过循环利用原始语法的规则为减少的输入编写最小的语法(简化尽可能)

Reduce your grammar as far as possible so that it still produces the same error on the respective input (most of the time that means wrinting a minimal grammar for the reduced input by recycling the rules of the original grammar (simplifying as far as possible)

确保词法分析器正确分割输入(为此ANTLRWorks中的功能向您显示词法分析器输出是很棒的)

Make sure the lexer segments the input properly (for that the feature in ANTLRWorks that shows you the lexer output is great)

看看ParseTree. ANTLR的testRig具有以图形方式显示ParseTree的功能(您可以通过ANTLRWorks或ANTLR的TreeViewer来访问此功能),从而可以查看解析器的解释与您的解释有何不同

Have a look at the ParseTree. ANTLR's testRig has a feature that displays the ParseTree graphically (You can access this functionality either through ANTLRWorks or by ANTLR's TreeViewer) so you can have a look where the parser's interpretation differs from the one you have

手动"执行解析.这意味着您将自己学习语法并逐步进行输入,并尝试不应用任何逻辑或假设/知识/等.在那个过程中.只需按照自己的语法操作即可,就像计算机可以做到的那样.对您采取的每一个步骤都提出疑问(是否存在另一种匹配输入的方式),并且总是尝试以不同于您实际希望对其进行解析的另一种方式来匹配输入

Do the parsing "by hand". That means you will take your grammar and go through the input by yourself, step by step and try to apply no logic or assumptions/knowledge/etc. during that process. Just follow through your own grammar as a computer would do it. Question every step you take (Is there another way to match the input) and always try to match the input in another way than the one you actually want it to be parsed

尝试修复最小语法中的错误,然后将解决方案迁移到实际语法中.

Try to fix the error in the minimal grammar and migrate the solution to your real grammar afterwards.

这篇关于如何调试ANTLR4语法无关/不匹配的输入错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆