Antlr lexer语义谓词的替代方法 [英] Antlr lexer semantic predicate on an alternative

查看:93
本文介绍了Antlr lexer语义谓词的替代方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出语法:

grammar Test;
words: (WORD|SPACE|DOT)+;
WORD : (
       LD
       |DOT       {_input.LA(1)!='.'}?
       ) +        ;
DOT: '.';
SPACE: ' ';
fragment LD: ~[.\n\r ];

使用Antlr4生成的Lexer,进行输入:

with Antlr4 generated Lexer, for an input:

test. test.test test..test

令牌序列类似于:

[@0,0:4='test.',<1>,1:0]
[@1,5:5=' ',<3>,1:5]
[@2,6:14='test.test',<1>,1:6]
[@3,15:15=' ',<3>,1:15]
[@4,16:19='test',<1>,1:16]
[@5,20:20='.',<2>,1:20]
[@6,21:25='.test',<1>,1:21]
[@7,26:25='<EOF>',<-1>,1:26]

为什么我要看到test. .test

让我更困惑的是输入:

test..test test. test.test

令牌序列为:

[@0,0:3='test',<1>,1:0]
[@1,4:4='.',<2>,1:4]
[@2,5:9='.test',<1>,1:5]
[@3,10:10=' ',<3>,1:10]
[@4,11:14='test',<1>,1:11]
[@5,15:15='.',<1>,1:15]
[@6,16:16=' ',<3>,1:16]
[@7,17:20='test',<1>,1:17]
[@8,21:25='.test',<1>,1:21]
[@9,26:25='<EOF>',<-1>,1:26]

在这里test.test被分成两个标记,而在上面则是一个. _input.LA(1)的调用是否有引起这种情况的副作用?有人可以解释吗?

Here the test.test is separated into two tokens while in above it is one. Is the calling of _input.LA(1) has some side effect to cause this? Can some one explain?

我正在使用Antlr4.

I'm using Antlr4.

推荐答案

快速解决方案是检查以前的LA(-1)令牌是否不相等.并添加前导的可选<​​c9>.

Quick fix is to check the previous LA(-1) token if it is unequal . and add a leading optional DOT.

结果语法为:

grammar Test;
words: (WORD|SPACE|DOT)+;
WORD : DOT? (
       LD
       |{_input.LA(-1)!='.'}? DOT       
       ) +        ;
DOT: '.';
SPACE: ' ';
fragment LD: ~[.\n\r ];

玩得开心,享受ANTLR,这是一个很好的工具.

Have fun and enjoy ANTLR, it is a nice tool.

这篇关于Antlr lexer语义谓词的替代方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆