Lexer处理带有行号前缀的行 [英] Lexer to handle lines with line number prefix

查看：88 发布时间：2020/9/2 23:58:39 antlr antlr4

本文介绍了Lexer处理带有行号前缀的行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在为一种如下所示的语言编写解析器:

I'm writing a parser for a language that looks like the following:

L00<<identifier>>
L10<<keyword>>
L250<<identifier>>
<<identifier>>

也就是说，每一行可能以或可能不以格式为Lxxx..的行号("L"后跟一个或多个数字)开头，后跟标识符或关键字.标识符是标准的[a-zA-Z_][a-zA-Z0-9_]*，L之后的位数不是固定的.行号和后面的标识符/关键字之间的空格是可选的(在大多数情况下不存在).

That is, each line may or may not start with a line number of the form Lxxx.. ('L' followed by one or more digits) followed by an identifer or a keyword. Identifiers are standard [a-zA-Z_][a-zA-Z0-9_]* and the number of digits following the L is not fixed. Spaces between the line number and following identifer/keyword are optional (and not present in most cases).

我当前的词法分析器看起来像:

My current lexer looks like:

// Parser rules
commands      : command*;
command       : LINE_NUM? keyword NEWLINE
              | LINE_NUM? IDENTIFIER NEWLINE;
keyword       : KEYWORD_A | KEYWORD_B | ... ;

// Lexer rules
fragment INT  : [0-9]+;
LINE_NUM      : 'L' INT;
KEYWORD_A     : 'someKeyword';
KEYWORD_B     : 'reservedWord';
...
IDENTIFIER    : [a-zA-Z_][a-zA-Z0-9_]*

但是，这导致所有以LINE_NUM令牌开头的行都被标记为IDENTIFIER s.

However this results in all lines beginning with a LINE_NUM token to be tokenized as IDENTIFIERs.

是否可以使用ANTLR语法正确标记此输入?

Is there a way to properly tokenize this input using an ANTLR grammar?

推荐答案

您需要向IDENTIFIER添加语义谓词:

You need to add a semantic predicate to IDENTIFIER:

IDENTIFIER
  : {_input.getCharPositionInLine() != 0
      || _input.LA(1) != 'L'
      || !Character.isDigit(_input.LA(2))}?
    [a-zA-Z_] [a-zA-Z0-9_]*
  ;

您还可以通过使用词法分析器模式来避免语义谓词.

You could also avoid semantic predicates by using lexer modes.

//
// Default mode is active at the beginning of a line
//

LINE_NUM
  : 'L' [0-9]+ -> pushMode(NotBeginningOfLine)
  ;

KEYWORD_A : 'someKeyword' -> pushMode(NotBeginningOfLine);
KEYWORD_B : 'reservedWord' -> pushMode(NotBeginningOfLine);
IDENTIFIER
  : ( 'L'
    | 'L' [a-zA-Z_] [a-zA-Z0-9_]*
    | [a-zA-KM-Z_] [a-zA-Z0-9_]*
    )
    -> pushMode(NotBeginningOfLine)
  ;
NL : ('\r' '\n'? | '\n');

mode NotBeginningOfLine;

  NotBeginningOfLine_NL : ('\r' '\n'? | '\n') -> type(NL), popMode;
  NotBeginningOfLine_KEYWORD_A : KEYWORD_A -> type(KEYWORD_A);
  NotBeginningOfLine_KEYWORD_B : KEYWORD_B -> type(KEYWORD_B);
  NotBeginningOfLine_IDENTIFIER
    : [a-zA-Z_] [a-zA-Z0-9_]* -> type(IDENTIFIER)
    ;

这篇关于Lexer处理带有行号前缀的行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Lexer处理带有行号前缀的行 [英] Lexer to handle lines with line number prefix

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Lexer处理带有行号前缀的行 [英] Lexer to handle lines with line number prefix

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭