分别使用ANTLR解析器和Lexer [英] Using ANTLR Parser and Lexer Separatly

查看:775
本文介绍了分别使用ANTLR解析器和Lexer的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用ANTLR版本4创建编译器.第一阶段是Lexer部分.我创建了"CompilerLexer.g4"文件,并在其中放入了词法分析器规则.

CompilerLexer.g4:


lexer grammar CompilerLexer;

INT         :   'int'   ;   //1
FLOAT       :   'float' ;   //2
BEGIN       :   'begin' ;   //3
END         :   'end'   ;   //4
To          :   'to'    ;   //5
NEXT        :   'next'  ;   //6
REAL        :   'real'  ;   //7
BOOLEAN     :   'bool'  ;   //8
.
.
.
NOTEQUAL    :   '!='    ;   //46
AND         :   '&&'    ;   //47
OR          :   '||'    ;   //48
POW         :   '^'     ;   //49
ID          : [a-zA-Z]+ ;   //50




WS
:   ' ' -> channel(HIDDEN)  //50
;


现在是第二阶段的解析器了.我创建了"CompilerParser.g4"文件,并在其中放入了语法,但是有很多警告和错误.

CompilerParser.g4:


parser grammar CompilerParser;

options {   tokenVocab = CompilerLexer; }

STATEMENT   :   EXPRESSION SEMIC
        |   IFSTMT
        |   WHILESTMT
        |   FORSTMT
        |   READSTMT SEMIC
        |   WRITESTMT SEMIC
        |   VARDEF SEMIC
        |   BLOCK
        ;

BLOCK       : BEGIN STATEMENTS END
        ;

STATEMENTS  : STATEMENT STATEMENTS*
        ;

EXPRESSION  : ID ASSIGN EXPRESSION
        | BOOLEXP
        ;

RELEXP      : MODEXP (GT | LT | EQUAL | NOTEQUAL | LE | GE | AND | OR) RELEXP
        | MODEXP
        ;

.
.
.

VARDEF      : (ID COMA)* ID COLON VARTYPE
        ;

VARTYPE     : INT
        | FLOAT
        | CHAR
        | STRING
        ;
compileUnit
:   EOF
;


警告和错误:

  • 解析器中令牌"BLOCK"的隐式定义
  • 解析器中令牌'BOOLEXP'的隐式定义
  • 解析器中令牌'EXP'的隐式定义
  • 解析器中令牌"EXPLIST"的隐式定义
  • 解析器中不允许使用词法分析器规则"BLOCK"
  • 解析器中不允许使用词法分析器规则"EXP"
  • 解析器中不允许使用词法分析器规则"EXPLIST"
  • 解析器中不允许使用词法分析器规则"EXPRESSION"

有许多这样的警告和错误.是什么原因?

一般问题:组合语法与分别使用lexer和解析器之间有什么区别?应该如何连接单独的语法和词法分析器文件?

解决方案

Lexer规则以大写字母开头,解析器规则以小写字母开头.在解析器语法中,您无法定义标记.而且由于ANTLR会将您所有大写的规则视为词法分析器规则,因此会产生这些错误/警告.

编辑

user2998131写道:

一般问题:组合语法与分别使用lexer和解析器之间有什么区别?

分隔词法分析器和解析器规则将使事情井井有条.另外,当创建单独的词法分析器和解析器语法时,您不能(偶然)将文字标记放入解析器语法中,但需要在词法分析器语法中定义所有标记.这将使哪些词法分析器规则先于其他规则变得显而易见,并且您不能在重复文字常量内进行任何错字输入:

grammar P;

r1 : 'foo' r2;

r2 : r3 'foo '; // added an accidental space after 'foo'

但是,当您具有解析器语法时,就无法犯该错误.您将必须使用与'foo'相匹配的词法分析器规则:

parser grammar P

options { tokenVocab=L; }

r1 : FOO r2;

r2 : r3 FOO;


lexer grammar L;

FOO : 'foo';

user2998131写道:

应如何连接单独的语法文件和词法分析器文件?

就像在解析器语法中一样:指向options { ... }块内的正确tokenVocab.

请注意,您也可以导入语法,这是不同的: https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Grammar+Structure#GrammarStructure-GrammarImports

I used ANTLR version 4 for creating compiler.First Phase was the Lexer part. I created "CompilerLexer.g4" file and putted lexer rules in it.It works fine.

CompilerLexer.g4:


lexer grammar CompilerLexer;

INT         :   'int'   ;   //1
FLOAT       :   'float' ;   //2
BEGIN       :   'begin' ;   //3
END         :   'end'   ;   //4
To          :   'to'    ;   //5
NEXT        :   'next'  ;   //6
REAL        :   'real'  ;   //7
BOOLEAN     :   'bool'  ;   //8
.
.
.
NOTEQUAL    :   '!='    ;   //46
AND         :   '&&'    ;   //47
OR          :   '||'    ;   //48
POW         :   '^'     ;   //49
ID          : [a-zA-Z]+ ;   //50




WS
:   ' ' -> channel(HIDDEN)  //50
;


Now it is time for phase 2 which is the parser.I created "CompilerParser.g4" file and putted grammars in it but have dozens warning and errors.

CompilerParser.g4:


parser grammar CompilerParser;

options {   tokenVocab = CompilerLexer; }

STATEMENT   :   EXPRESSION SEMIC
        |   IFSTMT
        |   WHILESTMT
        |   FORSTMT
        |   READSTMT SEMIC
        |   WRITESTMT SEMIC
        |   VARDEF SEMIC
        |   BLOCK
        ;

BLOCK       : BEGIN STATEMENTS END
        ;

STATEMENTS  : STATEMENT STATEMENTS*
        ;

EXPRESSION  : ID ASSIGN EXPRESSION
        | BOOLEXP
        ;

RELEXP      : MODEXP (GT | LT | EQUAL | NOTEQUAL | LE | GE | AND | OR) RELEXP
        | MODEXP
        ;

.
.
.

VARDEF      : (ID COMA)* ID COLON VARTYPE
        ;

VARTYPE     : INT
        | FLOAT
        | CHAR
        | STRING
        ;
compileUnit
:   EOF
;


Warning and errors:

  • implicit definition of token 'BLOCK' in parser
  • implicit definition of token 'BOOLEXP' in parser
  • implicit definition of token 'EXP' in parser
  • implicit definition of token 'EXPLIST' in parser
  • lexer rule 'BLOCK' not allowed in parser
  • lexer rule 'EXP' not allowed in parser
  • lexer rule 'EXPLIST' not allowed in parser
  • lexer rule 'EXPRESSION' not allowed in parser

Have dozens of these warning and errors. What is the cause?

General Questions: What is difference between using combined grammar and using lexer and parser separately? How should join separate grammar and lexer files?

解决方案

Lexer rules start with a capital letter, and parser rules start with a lowercase letter. In a parser grammar, you can't define tokens. And since ANTLR thinks all your upper-cased rules lexer rules, it produces theses errors/warning.

EDIT

user2998131 wrote:

General Questions: What is difference between using combined grammar and using lexer and parser separately?

Separating the lexer and parser rules will keeps things organized. Also, when creating separate lexer and parser grammars, you can't (accidentally) put literal tokens inside your parser grammar but will need to define all tokens in your lexer grammar. This will make it apparent which lexer rules get matched before others, and you can't make any typo's inside recurring literal tokens:

grammar P;

r1 : 'foo' r2;

r2 : r3 'foo '; // added an accidental space after 'foo'

But when you have a parser grammar, you can't make that mistake. You will have to use the lexer rule that matches 'foo':

parser grammar P

options { tokenVocab=L; }

r1 : FOO r2;

r2 : r3 FOO;


lexer grammar L;

FOO : 'foo';

user2998131 wrote:

How should join separate grammar and lexer files?

Just like you do in your parser grammar: you point to the proper tokenVocab inside the options { ... } block.

Note that you can also import grammars, which is something different: https://theantlrguy.atlassian.net/wiki/display/ANTLR4/Grammar+Structure#GrammarStructure-GrammarImports

这篇关于分别使用ANTLR解析器和Lexer的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆