扫描器(使用ANTLR的Lexing关键字) [英] Scanner (Lexing keywords with ANTLR)

查看：51 发布时间：2021/4/7 20:26:26 compiler-construction antlr antlr3 lexer

本文介绍了扫描器(使用ANTLR的Lexing关键字)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直在为程序编写扫描器，并且在线上的大多数教程都包括一个与扫描器一起的解析器.如果不同时编写解析器，似乎不可能编写词法分析器.我只是试图生成令牌，而不是解释它们.我想识别INT令牌，浮点令牌以及一些令牌，例如"begin"和"end"

I have been working on writing a scanner for my program and most of the tutorials online include a parser along with the scanner. It doesn't seem possible to write a lexer without writing a parser at the same time. I am only trying to generate tokens, not interpret them. I want to recognize INT tokens, float tokens, and some tokens like "begin" and "end"

我对如何匹配关键字感到困惑.我未成功尝试以下操作:

I am confused about how to match keywords. I unsuccessfully tried the following:

KEYWORD : KEY1 | KEY2;

KEY1 : {input.LT(1).getText().equals("BEGIN")}? LETTER+ ;
KEY2 : {input.LT(1).getText().equals("END")}? LETTER+ ;

FLOATLITERAL_INTLITERAL
  : DIGIT+ 
  ( 
    { input.LA(2) != '.' }? => '.' DIGIT* { $type = FLOATLITERAL; }
    | { $type = INTLITERAL; }
  )
  | '.'  DIGIT+ {$type = FLOATLITERAL}
;

fragment LETTER : ('a'..'z' | 'A'..'Z');
fragment DIGIT  : ('0'..'9');

IDENTIFIER 
 : LETTER 
   | LETTER DIGIT (LETTER|DIGIT)+ 
   | LETTER LETTER (LETTER|DIGIT)*
 ;

WS  //Whitespace
  : (' ' | '\t' | '\n' | '\r' | '\f')+  {$channel = HIDDEN;}
;

推荐答案

如果只想使用词法分析器，请使用以下语法:

If you only want a lexer, start your grammar with:

lexer grammar FooLexer; // creates: FooLexer.java

LT(int):令牌只能在解析器规则内使用(在 TokenStream ).在词法分析器规则内部，您只能使用 LA(int):int 从

LT(int): Token can only be used inside parser rules (on a TokenStream). Inside lexer rules, you can only use LA(int): int that gets the next int (character) from the IntStream. But there is no need for all the manual look ahead. Just do something like this:

lexer grammar FooLexer;

BEGIN
  :  'BEGIN'
  ;

END
  :  'END'
  ;

FLOAT
  :  DIGIT+ '.' DIGIT+
  ;

INT
  :  DIGIT+
  ;

IDENTIFIER 
  :  LETTER (LETTER | DIGIT)*
  ;

WS
  :  (' ' | '\t' | '\n' | '\r' | '\f')+  {$channel = HIDDEN;}
  ; 

fragment LETTER : ('a'..'z' | 'A'..'Z');
fragment DIGIT  : ('0'..'9');

我认为不需要创建一个与所有关键字匹配的名为 KEYWORD 的令牌:您需要在 BEGIN 和之间进行区分END 令牌，对不对?但是，如果您真的想要这个，只需执行以下操作即可:

I don't see the need to create a token called KEYWORD that matches all keywords: you'll want to make a distinction between a BEGIN and END token, right? But if you really want this, simply do:

KEYWORD
  :  'BEGIN'
  |  'END'
  ;

并删除 BEGIN 和 END 规则.只要确保在 IDENTIFIER 之前定义了 KEYWORD .

and remove the BEGIN and END rules. Just make sure KEYWORD is defined before IDENTIFIER.

使用以下类测试词法分析器:

Test the lexer with the following class:

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String src = "BEGIN END 3.14159 42 FOO";
    FooLexer lexer = new FooLexer(new ANTLRStringStream(src));
    while(true) {
      Token token = lexer.nextToken();
      if(token.getType() == FooLexer.EOF) {
        break;
      }
      System.out.println(token.getType() + " :: " + token.getText());
    }
  }
}

如果生成词法分析器，请编译.java源文件并运行Main类，如下所示:

If you generate a lexer, compile the .java source files and run the Main class like this:

java -cp antlr-3.3.jar org.antlr.Tool FooLexer.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main

以下输出将打印到控制台:

the following output will be printed to the console:

4 :: BEGIN
11 ::  
5 :: END
11 ::  
7 :: 3.14159
11 ::  
8 :: 42
11 ::  
10 :: FOO

这篇关于扫描器(使用ANTLR的Lexing关键字)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

扫描器(使用ANTLR的Lexing关键字) [英] Scanner (Lexing keywords with ANTLR)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

扫描器(使用ANTLR的Lexing关键字) [英] Scanner (Lexing keywords with ANTLR)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭