扫描器(使用ANTLR的Lexing关键字) [英] Scanner (Lexing keywords with ANTLR)
问题描述
我一直在为程序编写扫描器,并且在线上的大多数教程都包括一个与扫描器一起的解析器.如果不同时编写解析器,似乎不可能编写词法分析器.我只是试图生成令牌,而不是解释它们.我想识别INT令牌,浮点令牌以及一些令牌,例如"begin"和"end"
I have been working on writing a scanner for my program and most of the tutorials online include a parser along with the scanner. It doesn't seem possible to write a lexer without writing a parser at the same time. I am only trying to generate tokens, not interpret them. I want to recognize INT tokens, float tokens, and some tokens like "begin" and "end"
我对如何匹配关键字感到困惑.我未成功尝试以下操作:
I am confused about how to match keywords. I unsuccessfully tried the following:
KEYWORD : KEY1 | KEY2;
KEY1 : {input.LT(1).getText().equals("BEGIN")}? LETTER+ ;
KEY2 : {input.LT(1).getText().equals("END")}? LETTER+ ;
FLOATLITERAL_INTLITERAL
: DIGIT+
(
{ input.LA(2) != '.' }? => '.' DIGIT* { $type = FLOATLITERAL; }
| { $type = INTLITERAL; }
)
| '.' DIGIT+ {$type = FLOATLITERAL}
;
fragment LETTER : ('a'..'z' | 'A'..'Z');
fragment DIGIT : ('0'..'9');
IDENTIFIER
: LETTER
| LETTER DIGIT (LETTER|DIGIT)+
| LETTER LETTER (LETTER|DIGIT)*
;
WS //Whitespace
: (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;}
;
推荐答案
如果只想使用词法分析器,请使用以下语法:
If you only want a lexer, start your grammar with:
lexer grammar FooLexer; // creates: FooLexer.java
LT(int):令牌
只能在解析器规则内使用(在 TokenStream
).在词法分析器规则内部,您只能使用 LA(int):int
从
LT(int): Token
can only be used inside parser rules (on a TokenStream
). Inside lexer rules, you can only use LA(int): int
that gets the next int
(character) from the IntStream
. But there is no need for all the manual look ahead. Just do something like this:
lexer grammar FooLexer;
BEGIN
: 'BEGIN'
;
END
: 'END'
;
FLOAT
: DIGIT+ '.' DIGIT+
;
INT
: DIGIT+
;
IDENTIFIER
: LETTER (LETTER | DIGIT)*
;
WS
: (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;}
;
fragment LETTER : ('a'..'z' | 'A'..'Z');
fragment DIGIT : ('0'..'9');
我认为不需要创建一个与所有关键字匹配的名为 KEYWORD
的令牌:您需要在 BEGIN
和之间进行区分END
令牌,对不对?但是,如果您真的想要这个,只需执行以下操作即可:
I don't see the need to create a token called KEYWORD
that matches all keywords: you'll want to make a distinction between a BEGIN
and END
token, right? But if you really want this, simply do:
KEYWORD
: 'BEGIN'
| 'END'
;
并删除 BEGIN
和 END
规则.只要确保在 IDENTIFIER
之前定义了 KEYWORD
.
and remove the BEGIN
and END
rules. Just make sure KEYWORD
is defined before IDENTIFIER
.
使用以下类测试词法分析器:
Test the lexer with the following class:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String src = "BEGIN END 3.14159 42 FOO";
FooLexer lexer = new FooLexer(new ANTLRStringStream(src));
while(true) {
Token token = lexer.nextToken();
if(token.getType() == FooLexer.EOF) {
break;
}
System.out.println(token.getType() + " :: " + token.getText());
}
}
}
如果生成词法分析器,请编译.java源文件并运行Main类,如下所示:
If you generate a lexer, compile the .java source files and run the Main class like this:
java -cp antlr-3.3.jar org.antlr.Tool FooLexer.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
以下输出将打印到控制台:
the following output will be printed to the console:
4 :: BEGIN
11 ::
5 :: END
11 ::
7 :: 3.14159
11 ::
8 :: 42
11 ::
10 :: FOO
这篇关于扫描器(使用ANTLR的Lexing关键字)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!