如何强制 ANTLR 解析所有输入 CharStream [英] How to force ANTLR to parse all input CharStream

查看:19
本文介绍了如何强制 ANTLR 解析所有输入 CharStream的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 ANTLR4 来解析语法文件.当我使用 BaseErrorListener 检测错误时,我遇到了问题.当遇到非法输入字符串时,ANTLR 会自动匹配相应的分支,然后忽略后续的字符流,即使它包含错误.我想检测那个错误.这是我的 g4 文件和 java 文件.
TransitionLexer 是我的词法分析器文件,TransitionCondition 是我的解析器文件.ErrorDialogListener.java 是我的 errorListener 和 Test.java id 主 java 文件.

I'm using ANTLR4 to parse a syntax file. When I use BaseErrorListener to detect errors, I got a problem. When faced with an illegal input string, ANTLR automatically matches the appropriate branch and then ignores the subsequent stream of characters even if it contains errors. And I want to detect that error. Here are my g4 file and java file.
TransitionLexer is my lexer file and TransitionCondition is my parser file. ErrorDialogListener.java is my errorListener and Test.java id main java file.

TransitionLexer.g4

lexer grammar TransitionLexer;

BOOLEAN: 'true' | 'false';
IF: 'if';
THEN: 'then';
ELSE: 'else';

NAME: (ALPHA | CHINESE | '_')(ALPHA | CHINESE | '_'|DIGIT)*;

ALPHA: [a-zA-Z];
CHINESE: [\u4e00-\u9fa5];

NUMBER: INT | REAL;
INT: DIGIT+
    |'(-'DIGIT+')';
REAL: DIGIT+ ('.' DIGIT+)?
    | '(-' DIGIT+ ('.' DIGIT+)? ')';
fragment DIGIT: [0-9];

OPCOMPARE: '='|'>='|'<='|'>'|'<';
WS: [ \t\n\r]+ ->skip;
SL_COMMENT:  '/*' .*? '*/' ->skip;

TransitionCondition.g4

grammar TransitionCondition;
import TransitionLexer;

condition : stat+;
stat : expr;
expr: expr (('and' | 'or') expr)+
    | '(' expr ')'
    | '(' var OPCOMPARE value ')'
    | booleanExpr
    | BOOLEAN
    ;

var: localStates
     | globalStates
     | connector
     ;
localStates: NAME;
globalStates: 'Top' ('.' brick)+ '.' NAME;
connector: brick '.' NAME;

value: userdefinedValue | basicValue;
userdefinedValue: NAME;
basicValue: basicValue op=('*'|'/') basicValue
                    | basicValue op=('+' | '-') basicValue
                    | basicValue ('and' | 'or') basicValue
                    | NUMBER | BOOLEAN
                    | '(' basicValue ')'
                    ;

booleanExpr: booleanExpr OPCOMPARE booleanExpr
           | '(' booleanExpr ')'
           | NUMBER (OPCOMPARE|'*'| '/'|'+'|'-') NUMBER
           ;
brick: NAME;

ErrorDialogListener.java

package errorprocess;

import java.awt.Color;
import java.awt.Container;
import java.util.Collections;
import java.util.List;

import javax.swing.JDialog;
import javax.swing.JFrame;
import javax.swing.JLabel;

import org.antlr.v4.runtime.BaseErrorListener;
import org.antlr.v4.runtime.Parser;
import org.antlr.v4.runtime.RecognitionException;
import org.antlr.v4.runtime.Recognizer;
import org.antlr.v4.runtime.atn.ATNConfigSet;
import org.antlr.v4.runtime.dfa.DFA;

public class ErrorDialogListener extends BaseErrorListener {


    @Override
    public void reportContextSensitivity(Parser recognizer, DFA dfa, int startIndex, int stopIndex, int prediction,
            ATNConfigSet configs) {
        System.out.println(dfa.toLexerString());
        System.out.println(dfa.getStates());        
        super.reportContextSensitivity(recognizer, dfa, startIndex, stopIndex, prediction, configs);
    }

    @Override
    public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line, int charPositionInLine,
            String msg, RecognitionException e) {
        List<String> stack = ((Parser)recognizer).getRuleInvocationStack();
        Collections.reverse(stack);
        StringBuilder buf = new StringBuilder();
        buf.append("rule stack: "+stack+" ");
        buf.append("line "+line+":"+charPositionInLine+" at "+
                   offendingSymbol+": "+msg);

        JDialog dialog = new JDialog();
        Container contentPane = dialog.getContentPane();
        contentPane.add(new JLabel(buf.toString()));
        contentPane.setBackground(Color.white);
        dialog.setTitle("Syntax error");
        dialog.pack();
        dialog.setLocationRelativeTo(null);
        dialog.setDefaultCloseOperation(JFrame.DISPOSE_ON_CLOSE);
        dialog.setVisible(true);
    }

}

Test.java

package errorprocess;

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;

import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.atn.PredictionMode;

import antlr4.my.transition.TransitionConditionLexer;
import antlr4.my.transition.TransitionConditionParser;

public class Test {

    public static void main(String[] args) throws IOException {
        InputStream in = new FileInputStream("G:\\AltaRica\\ANTLR4\\test\\condition\\t.expr");
        ANTLRInputStream input = new ANTLRInputStream(in);
        TransitionConditionLexer lexer = new TransitionConditionLexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        TransitionConditionParser parser = new TransitionConditionParser(tokens);
        parser.removeErrorListeners();
        parser.addErrorListener(new ErrorDialogListener());
//        parser.addErrorListener(new DiagnosticErrorListener());
//        parser.getInterpreter().setPredictionMode(PredictionMode.LL_EXACT_AMBIG_DETECTION);
//        parser.getInterpreter().setPredictionMode(PredictionMode.LL);
        parser.condition();
    }

}

主要问题

当我的输入是(Top.b2.states = 名义)和(b1.i1 = 错误)和(状态 >= 5.5),解析器工作正常.
但是当我的输入是 (Top.b2.states =nomal) aaa (b1.i1 = wrong) and (states >= 5.5) 时,解析器只解析 (Top.b2.states=nominal) 并忽略 aaa 之后的单词,这与语法文件不正确.
我想原因是解析器遵循我在 TransitionCondition.g4 中的第一条规则的第二个分支,即 expr: '(' expr ')',并且只是忽略其他分支.那么在这种情况下如何强制ANTLR识别所有输入或者如何强制ANTLR只选择第一个分支(expr: expr (('and' | 'or') expr)+)?

The main problem

When my input is (Top.b2.states = nominal) and (b1.i1 = wrong) and (states >= 5.5), the parser works fine.
But when my input is (Top.b2.states = nominal) aaa (b1.i1 = wrong) and (states >= 5.5), the parser only parse (Top.b2.states = nominal) and ignores words after aaa which is not right with syntax file.
I guess the reason is that the parser follows the second branch of my first rule in TransitionCondition.g4, which is expr: '(' expr ')', and simply ignores others. So How to force ANTLR recognize all input or how to force ANTLR only choose the first branch(expr: expr (('and' | 'or') expr)+) in this situation?

我尝试使用 DiagnosticErrorListener 或覆盖 reportContextSensitivity() 但它似乎不起作用.

I tried to use DiagnosticErrorListener or override reportContextSensitivity() but it seems not worked.

推荐答案

您的主要规则需要以 EOF 标记结束 - ANTLR 提供的特殊标记,与输入结束匹配.

Your main rule needs to end with the EOF token - an ANTLR-provided special token that matches end of input.

如果令牌不存在,ANTLR 将只解析它可以匹配的任何内容然后停止.通过将 EOF 放在输入规则的末尾,您告诉 ANTLR 解析的任何内容都必须在输入的末尾结束.

If the token's not there, ANTLR will just parse whatever it can match and then stop. By putting the EOF at the end of your entry rule, you tell ANTLR that whatever it parses must end at the end of input.

这篇关于如何强制 ANTLR 解析所有输入 CharStream的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆