词法分析器处理带有行号前缀的行 [英] Lexer to handle lines with line number prefix

查看:15
本文介绍了词法分析器处理带有行号前缀的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为一种如下所示的语言编写解析器:

I'm writing a parser for a language that looks like the following:

L00<<identifier>>
L10<<keyword>>
L250<<identifier>>
<<identifier>>

也就是说,每一行可能以也可能不以 Lxxx.. 形式的行号开头('L' 后跟一个或多个数字),后跟一个标识符或一个关键字.标识符是标准的[a-zA-Z_][a-zA-Z0-9_]*L 后面的数字位数不是固定的.行号和后面的标识符/关键字之间的空格是可选的(大多数情况下不存在).

That is, each line may or may not start with a line number of the form Lxxx.. ('L' followed by one or more digits) followed by an identifer or a keyword. Identifiers are standard [a-zA-Z_][a-zA-Z0-9_]* and the number of digits following the L is not fixed. Spaces between the line number and following identifer/keyword are optional (and not present in most cases).

我当前的词法分析器看起来像:

My current lexer looks like:

// Parser rules
commands      : command*;
command       : LINE_NUM? keyword NEWLINE
              | LINE_NUM? IDENTIFIER NEWLINE;
keyword       : KEYWORD_A | KEYWORD_B | ... ;

// Lexer rules
fragment INT  : [0-9]+;
LINE_NUM      : 'L' INT;
KEYWORD_A     : 'someKeyword';
KEYWORD_B     : 'reservedWord';
...
IDENTIFIER    : [a-zA-Z_][a-zA-Z0-9_]*

然而,这会导致所有以 LINE_NUM 标记开头的行都被标记为 IDENTIFIERs.

However this results in all lines beginning with a LINE_NUM token to be tokenized as IDENTIFIERs.

有没有办法使用 ANTLR 语法正确标记此输入?

Is there a way to properly tokenize this input using an ANTLR grammar?

推荐答案

需要在IDENTIFIER中添加语义谓词:

You need to add a semantic predicate to IDENTIFIER:

IDENTIFIER
  : {_input.getCharPositionInLine() != 0
      || _input.LA(1) != 'L'
      || !Character.isDigit(_input.LA(2))}?
    [a-zA-Z_] [a-zA-Z0-9_]*
  ;

您还可以通过使用词法分析器模式来避免语义谓词.

You could also avoid semantic predicates by using lexer modes.

//
// Default mode is active at the beginning of a line
//

LINE_NUM
  : 'L' [0-9]+ -> pushMode(NotBeginningOfLine)
  ;

KEYWORD_A : 'someKeyword' -> pushMode(NotBeginningOfLine);
KEYWORD_B : 'reservedWord' -> pushMode(NotBeginningOfLine);
IDENTIFIER
  : ( 'L'
    | 'L' [a-zA-Z_] [a-zA-Z0-9_]*
    | [a-zA-KM-Z_] [a-zA-Z0-9_]*
    )
    -> pushMode(NotBeginningOfLine)
  ;
NL : ('\r' '\n'? | '\n');

mode NotBeginningOfLine;

  NotBeginningOfLine_NL : ('\r' '\n'? | '\n') -> type(NL), popMode;
  NotBeginningOfLine_KEYWORD_A : KEYWORD_A -> type(KEYWORD_A);
  NotBeginningOfLine_KEYWORD_B : KEYWORD_B -> type(KEYWORD_B);
  NotBeginningOfLine_IDENTIFIER
    : [a-zA-Z_] [a-zA-Z0-9_]* -> type(IDENTIFIER)
    ;

这篇关于词法分析器处理带有行号前缀的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆