扫描仪(使用 ANTLR 对关键字进行词法分析) [英] Scanner (Lexing keywords with ANTLR)
问题描述
我一直在为我的程序编写一个扫描器,大多数在线教程都包含一个解析器和扫描器.似乎不可能在不同时编写解析器的情况下编写词法分析器.我只是想生成令牌,而不是解释它们.我想识别 INT 标记、浮动标记和一些标记,例如开始"和结束"
I have been working on writing a scanner for my program and most of the tutorials online include a parser along with the scanner. It doesn't seem possible to write a lexer without writing a parser at the same time. I am only trying to generate tokens, not interpret them. I want to recognize INT tokens, float tokens, and some tokens like "begin" and "end"
我对如何匹配关键字感到困惑.我尝试了以下失败:
I am confused about how to match keywords. I unsuccessfully tried the following:
KEYWORD : KEY1 | KEY2;
KEY1 : {input.LT(1).getText().equals("BEGIN")}? LETTER+ ;
KEY2 : {input.LT(1).getText().equals("END")}? LETTER+ ;
FLOATLITERAL_INTLITERAL
: DIGIT+
(
{ input.LA(2) != '.' }? => '.' DIGIT* { $type = FLOATLITERAL; }
| { $type = INTLITERAL; }
)
| '.' DIGIT+ {$type = FLOATLITERAL}
;
fragment LETTER : ('a'..'z' | 'A'..'Z');
fragment DIGIT : ('0'..'9');
IDENTIFIER
: LETTER
| LETTER DIGIT (LETTER|DIGIT)+
| LETTER LETTER (LETTER|DIGIT)*
;
WS //Whitespace
: (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;}
;
推荐答案
如果您只想要一个词法分析器,请从以下内容开始您的语法:
If you only want a lexer, start your grammar with:
lexer grammar FooLexer; // creates: FooLexer.java
LT(int): Token
只能在解析器规则中使用(在 TokenStream
).在词法分析器规则中,您只能使用 LA(int): int
从 IntStream
.但是不需要所有的手动展望.做这样的事情:
LT(int): Token
can only be used inside parser rules (on a TokenStream
). Inside lexer rules, you can only use LA(int): int
that gets the next int
(character) from the IntStream
. But there is no need for all the manual look ahead. Just do something like this:
lexer grammar FooLexer;
BEGIN
: 'BEGIN'
;
END
: 'END'
;
FLOAT
: DIGIT+ '.' DIGIT+
;
INT
: DIGIT+
;
IDENTIFIER
: LETTER (LETTER | DIGIT)*
;
WS
: (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;}
;
fragment LETTER : ('a'..'z' | 'A'..'Z');
fragment DIGIT : ('0'..'9');
我认为没有必要创建一个名为 KEYWORD
的标记来匹配所有关键字:您需要区分 BEGIN
和 END
令牌,对吗?但是,如果您真的想要这个,只需执行以下操作:
I don't see the need to create a token called KEYWORD
that matches all keywords: you'll want to make a distinction between a BEGIN
and END
token, right? But if you really want this, simply do:
KEYWORD
: 'BEGIN'
| 'END'
;
并删除 BEGIN
和 END
规则.只需确保在 IDENTIFIER
之前定义了 KEYWORD
.
and remove the BEGIN
and END
rules. Just make sure KEYWORD
is defined before IDENTIFIER
.
使用以下类测试词法分析器:
Test the lexer with the following class:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String src = "BEGIN END 3.14159 42 FOO";
FooLexer lexer = new FooLexer(new ANTLRStringStream(src));
while(true) {
Token token = lexer.nextToken();
if(token.getType() == FooLexer.EOF) {
break;
}
System.out.println(token.getType() + " :: " + token.getText());
}
}
}
如果生成词法分析器,请编译 .java 源文件并像这样运行 Main 类:
If you generate a lexer, compile the .java source files and run the Main class like this:
java -cp antlr-3.3.jar org.antlr.Tool FooLexer.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
以下输出将打印到控制台:
the following output will be printed to the console:
4 :: BEGIN
11 ::
5 :: END
11 ::
7 :: 3.14159
11 ::
8 :: 42
11 ::
10 :: FOO
这篇关于扫描仪(使用 ANTLR 对关键字进行词法分析)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!