Antlr 词法分析器语义谓词的另一种选择 [英] Antlr lexer semantic predicate on an alternative
问题描述
给定语法:
grammar Test;
words: (WORD|SPACE|DOT)+;
WORD : (
LD
|DOT {_input.LA(1)!='.'}?
) + ;
DOT: '.';
SPACE: ' ';
fragment LD: ~[.\n\r ];
使用 Antlr4 生成的词法分析器,用于输入:
with Antlr4 generated Lexer, for an input:
test. test.test test..test
令牌序列如下:
[@0,0:4='test.',<1>,1:0]
[@1,5:5=' ',<3>,1:5]
[@2,6:14='test.test',<1>,1:6]
[@3,15:15=' ',<3>,1:15]
[@4,16:19='test',<1>,1:16]
[@5,20:20='.',<2>,1:20]
[@6,21:25='.test',<1>,1:21]
[@7,26:25='<EOF>',<-1>,1:26]
为什么最后一段文本 test..test
被标记为 test
.
和 .test
>,而我应该看到 test.
.test
What puzzles why the last piece of text test..test
is tokenized into test
.
and .test
, while I was supposed to see test.
.test
更让我困惑的是输入:
test..test test. test.test
令牌序列为:
[@0,0:3='test',<1>,1:0]
[@1,4:4='.',<2>,1:4]
[@2,5:9='.test',<1>,1:5]
[@3,10:10=' ',<3>,1:10]
[@4,11:14='test',<1>,1:11]
[@5,15:15='.',<1>,1:15]
[@6,16:16=' ',<3>,1:16]
[@7,17:20='test',<1>,1:17]
[@8,21:25='.test',<1>,1:21]
[@9,26:25='<EOF>',<-1>,1:26]
这里的 test.test
被分成两个标记,而上面是一个标记._input.LA(1) 的调用是否有一些副作用导致这种情况?有人能解释一下吗?
Here the test.test
is separated into two tokens while in above it is one.
Is the calling of _input.LA(1) has some side effect to cause this? Can some one explain?
我使用的是 Antlr4.
I'm using Antlr4.
推荐答案
快速修复是检查之前的 LA(-1)
标记是否不相等 .
和添加前导可选 DOT
.
Quick fix is to check the previous LA(-1)
token if it is unequal .
and add a leading optional DOT
.
结果语法是:
grammar Test;
words: (WORD|SPACE|DOT)+;
WORD : DOT? (
LD
|{_input.LA(-1)!='.'}? DOT
) + ;
DOT: '.';
SPACE: ' ';
fragment LD: ~[.\n\r ];
玩得开心,享受 ANTLR,这是一个不错的工具.
Have fun and enjoy ANTLR, it is a nice tool.
这篇关于Antlr 词法分析器语义谓词的另一种选择的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!