如何强制ANTLR解析所有输入CharStream [英] How to force ANTLR to parse all input CharStream

查看:415
本文介绍了如何强制ANTLR解析所有输入CharStream的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用ANTLR4解析语法文件.当我使用BaseErrorListener来检测错误时,出现了问题.当遇到非法输入字符串时,ANTLR自动匹配相应的分支,然后忽略后续的字符流,即使其中包含错误.我想检测该错误.这是我的g4文件和java文件.
TransitionLexer 是我的词法分析器文件, TransitionCondition 是我的解析器文件. ErrorDialogListener.java 是我的errorListener和 Test.java ID主Java文件.

I'm using ANTLR4 to parse a syntax file. When I use BaseErrorListener to detect errors, I got a problem. When faced with an illegal input string, ANTLR automatically matches the appropriate branch and then ignores the subsequent stream of characters even if it contains errors. And I want to detect that error. Here are my g4 file and java file.
TransitionLexer is my lexer file and TransitionCondition is my parser file. ErrorDialogListener.java is my errorListener and Test.java id main java file.

TransitionLexer.g4

lexer grammar TransitionLexer;

BOOLEAN: 'true' | 'false';
IF: 'if';
THEN: 'then';
ELSE: 'else';

NAME: (ALPHA | CHINESE | '_')(ALPHA | CHINESE | '_'|DIGIT)*;

ALPHA: [a-zA-Z];
CHINESE: [\u4e00-\u9fa5];

NUMBER: INT | REAL;
INT: DIGIT+
    |'(-'DIGIT+')';
REAL: DIGIT+ ('.' DIGIT+)?
    | '(-' DIGIT+ ('.' DIGIT+)? ')';
fragment DIGIT: [0-9];

OPCOMPARE: '='|'>='|'<='|'>'|'<';
WS: [ \t\n\r]+ ->skip;
SL_COMMENT:  '/*' .*? '*/' ->skip;

TransitionCondition.g4

grammar TransitionCondition;
import TransitionLexer;

condition : stat+;
stat : expr;
expr: expr (('and' | 'or') expr)+
    | '(' expr ')'
    | '(' var OPCOMPARE value ')'
    | booleanExpr
    | BOOLEAN
    ;

var: localStates
     | globalStates
     | connector
     ;
localStates: NAME;
globalStates: 'Top' ('.' brick)+ '.' NAME;
connector: brick '.' NAME;

value: userdefinedValue | basicValue;
userdefinedValue: NAME;
basicValue: basicValue op=('*'|'/') basicValue
                    | basicValue op=('+' | '-') basicValue
                    | basicValue ('and' | 'or') basicValue
                    | NUMBER | BOOLEAN
                    | '(' basicValue ')'
                    ;

booleanExpr: booleanExpr OPCOMPARE booleanExpr
           | '(' booleanExpr ')'
           | NUMBER (OPCOMPARE|'*'| '/'|'+'|'-') NUMBER
           ;
brick: NAME;

ErrorDialogListener.java

package errorprocess;

import java.awt.Color;
import java.awt.Container;
import java.util.Collections;
import java.util.List;

import javax.swing.JDialog;
import javax.swing.JFrame;
import javax.swing.JLabel;

import org.antlr.v4.runtime.BaseErrorListener;
import org.antlr.v4.runtime.Parser;
import org.antlr.v4.runtime.RecognitionException;
import org.antlr.v4.runtime.Recognizer;
import org.antlr.v4.runtime.atn.ATNConfigSet;
import org.antlr.v4.runtime.dfa.DFA;

public class ErrorDialogListener extends BaseErrorListener {


    @Override
    public void reportContextSensitivity(Parser recognizer, DFA dfa, int startIndex, int stopIndex, int prediction,
            ATNConfigSet configs) {
        System.out.println(dfa.toLexerString());
        System.out.println(dfa.getStates());        
        super.reportContextSensitivity(recognizer, dfa, startIndex, stopIndex, prediction, configs);
    }

    @Override
    public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line, int charPositionInLine,
            String msg, RecognitionException e) {
        List<String> stack = ((Parser)recognizer).getRuleInvocationStack();
        Collections.reverse(stack);
        StringBuilder buf = new StringBuilder();
        buf.append("rule stack: "+stack+" ");
        buf.append("line "+line+":"+charPositionInLine+" at "+
                   offendingSymbol+": "+msg);

        JDialog dialog = new JDialog();
        Container contentPane = dialog.getContentPane();
        contentPane.add(new JLabel(buf.toString()));
        contentPane.setBackground(Color.white);
        dialog.setTitle("Syntax error");
        dialog.pack();
        dialog.setLocationRelativeTo(null);
        dialog.setDefaultCloseOperation(JFrame.DISPOSE_ON_CLOSE);
        dialog.setVisible(true);
    }

}

Test.java

package errorprocess;

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;

import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.atn.PredictionMode;

import antlr4.my.transition.TransitionConditionLexer;
import antlr4.my.transition.TransitionConditionParser;

public class Test {

    public static void main(String[] args) throws IOException {
        InputStream in = new FileInputStream("G:\\AltaRica\\ANTLR4\\test\\condition\\t.expr");
        ANTLRInputStream input = new ANTLRInputStream(in);
        TransitionConditionLexer lexer = new TransitionConditionLexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        TransitionConditionParser parser = new TransitionConditionParser(tokens);
        parser.removeErrorListeners();
        parser.addErrorListener(new ErrorDialogListener());
//        parser.addErrorListener(new DiagnosticErrorListener());
//        parser.getInterpreter().setPredictionMode(PredictionMode.LL_EXACT_AMBIG_DETECTION);
//        parser.getInterpreter().setPredictionMode(PredictionMode.LL);
        parser.condition();
    }

}

主要问题

当我输入 (Top.b2.states =标称值)和(b1.i1 =错误)和(状态> = 5.5),解析器可以正常工作.
但是当我的输入是(Top.b2.states =标称值)aaa(b1.i1 =错误)和(states> = 5.5)时,解析器仅解析(Top.b2.states =标称),并忽略 aaa 之后的单词,这在语法文件中是不正确的.
我想原因是解析器遵循了我在TransitionCondition.g4中的第一个规则的第二个分支,即 expr:'('expr')',而忽略了其他规则.那么在这种情况下如何强制ANTLR识别所有输入,或者如何强制ANTLR仅选择第一个分支( expr:expr(('and'|'or')expr)+ )?

The main problem

When my input is (Top.b2.states = nominal) and (b1.i1 = wrong) and (states >= 5.5), the parser works fine.
But when my input is (Top.b2.states = nominal) aaa (b1.i1 = wrong) and (states >= 5.5), the parser only parse (Top.b2.states = nominal) and ignores words after aaa which is not right with syntax file.
I guess the reason is that the parser follows the second branch of my first rule in TransitionCondition.g4, which is expr: '(' expr ')', and simply ignores others. So How to force ANTLR recognize all input or how to force ANTLR only choose the first branch(expr: expr (('and' | 'or') expr)+) in this situation?

我尝试使用DiagnosticErrorListener或重写reportContextSensitivity(),但似乎不起作用.

I tried to use DiagnosticErrorListener or override reportContextSensitivity() but it seems not worked.

推荐答案

您的主要规则需要以EOF令牌结尾-ANTLR提供的与输入结尾匹配的特殊令牌.

Your main rule needs to end with the EOF token - an ANTLR-provided special token that matches end of input.

如果令牌不存在,ANTLR只会解析它可以匹配的任何内容,然后停止.通过将EOF放在输入规则的末尾,您可以告诉ANTLR它解析的内容必须在输入的末尾结束.

If the token's not there, ANTLR will just parse whatever it can match and then stop. By putting the EOF at the end of your entry rule, you tell ANTLR that whatever it parses must end at the end of input.

这篇关于如何强制ANTLR解析所有输入CharStream的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆