扫描仪(使用 ANTLR 对关键字进行词法分析) [英] Scanner (Lexing keywords with ANTLR)

查看:29
本文介绍了扫描仪(使用 ANTLR 对关键字进行词法分析)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在为我的程序编写一个扫描器,大多数在线教程都包含一个解析器和扫描器.似乎不可能在不同时编写解析器的情况下编写词法分析器.我只是想生成令牌,而不是解释它们.我想识别 INT 标记、浮动标记和一些标记,例如开始"和结束"

I have been working on writing a scanner for my program and most of the tutorials online include a parser along with the scanner. It doesn't seem possible to write a lexer without writing a parser at the same time. I am only trying to generate tokens, not interpret them. I want to recognize INT tokens, float tokens, and some tokens like "begin" and "end"

我对如何匹配关键字感到困惑.我尝试了以下失败:

I am confused about how to match keywords. I unsuccessfully tried the following:

KEYWORD : KEY1 | KEY2;

KEY1 : {input.LT(1).getText().equals("BEGIN")}? LETTER+ ;
KEY2 : {input.LT(1).getText().equals("END")}? LETTER+ ;

FLOATLITERAL_INTLITERAL
  : DIGIT+ 
  ( 
    { input.LA(2) != '.' }? => '.' DIGIT* { $type = FLOATLITERAL; }
    | { $type = INTLITERAL; }
  )
  | '.'  DIGIT+ {$type = FLOATLITERAL}
;

fragment LETTER : ('a'..'z' | 'A'..'Z');
fragment DIGIT  : ('0'..'9');

IDENTIFIER 
 : LETTER 
   | LETTER DIGIT (LETTER|DIGIT)+ 
   | LETTER LETTER (LETTER|DIGIT)*
 ;

WS  //Whitespace
  : (' ' | '\t' | '\n' | '\r' | '\f')+  {$channel = HIDDEN;}
;  

推荐答案

如果您只想要一个词法分析器,请从以下内容开始您的语法:

If you only want a lexer, start your grammar with:

lexer grammar FooLexer; // creates: FooLexer.java

LT(int): Token 只能在解析器规则中使用(在 TokenStream).在词法分析器规则中,您只能使用 LA(int): intIntStream.但是不需要所有的手动展望.做这样的事情:

LT(int): Token can only be used inside parser rules (on a TokenStream). Inside lexer rules, you can only use LA(int): int that gets the next int (character) from the IntStream. But there is no need for all the manual look ahead. Just do something like this:

lexer grammar FooLexer;

BEGIN
  :  'BEGIN'
  ;

END
  :  'END'
  ;

FLOAT
  :  DIGIT+ '.' DIGIT+
  ;

INT
  :  DIGIT+
  ;

IDENTIFIER 
  :  LETTER (LETTER | DIGIT)*
  ;

WS
  :  (' ' | '\t' | '\n' | '\r' | '\f')+  {$channel = HIDDEN;}
  ; 

fragment LETTER : ('a'..'z' | 'A'..'Z');
fragment DIGIT  : ('0'..'9');

我认为没有必要创建一个名为 KEYWORD 的标记来匹配所有关键字:您需要区分 BEGINEND 令牌,对吗?但是,如果您真的想要这个,只需执行以下操作:

I don't see the need to create a token called KEYWORD that matches all keywords: you'll want to make a distinction between a BEGIN and END token, right? But if you really want this, simply do:

KEYWORD
  :  'BEGIN'
  |  'END'
  ;

并删除 BEGINEND 规则.只需确保在 IDENTIFIER 之前定义了 KEYWORD.

and remove the BEGIN and END rules. Just make sure KEYWORD is defined before IDENTIFIER.

使用以下类测试词法分析器:

Test the lexer with the following class:

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String src = "BEGIN END 3.14159 42 FOO";
    FooLexer lexer = new FooLexer(new ANTLRStringStream(src));
    while(true) {
      Token token = lexer.nextToken();
      if(token.getType() == FooLexer.EOF) {
        break;
      }
      System.out.println(token.getType() + " :: " + token.getText());
    }
  }
}

如果生成词法分析器,请编译 .java 源文件并像这样运行 Main 类:

If you generate a lexer, compile the .java source files and run the Main class like this:

java -cp antlr-3.3.jar org.antlr.Tool FooLexer.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main

以下输出将打印到控制台:

the following output will be printed to the console:

4 :: BEGIN
11 ::  
5 :: END
11 ::  
7 :: 3.14159
11 ::  
8 :: 42
11 ::  
10 :: FOO

这篇关于扫描仪(使用 ANTLR 对关键字进行词法分析)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆