具有冲突令牌的ANTLR行为 [英] ANTLR behaviour with conflicting tokens
问题描述
在令牌冲突的情况下,如何定义ANTLR词法分析器的行为? 让我解释一下冲突"令牌的含义. 例如,假定定义了以下内容:
How is ANTLR lexer behavior defined in the case of conflicting tokens? Let me explain what I mean by "conflicting" tokens. For example, assume that the following is defined:
INT_STAGE : '1'..'6';
INT : '0'..'9'+;
此处存在冲突,因为在读取了一系列数字后,词法分析器将不知道是一个INT还是多个INT_STAGE令牌(或两者的不同组合). 经过测试,看起来如果在INT_STAGE之后定义INT,则词法分析器希望查找INT_STAGE,但也许不是INT?否则,将找不到INT_STAGE.
There is a conflict here, because after reading a sequence of digits, the lexer would not know whether there is one INT or many INT_STAGE tokens (or different combinations of both). After a test, it looks like that if INT is defined after INT_STAGE, the lexer would prefer to find INT_STAGE, but maybe not INT then? Otherwise, no INT_STAGE would ever be found.
另一个例子是:
FOOL: ' fool'
FOO: 'foo'
ID : ('a'..'z'|'A'..'Z'|'_'|'%') ('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'%')*;
有人告诉我,这是识别所有令牌的正确"命令: 在阅读傻瓜"时,词法分析器将找到一个FOOL令牌,而不是FOO ID或其他.
I was told that this is the "right" order to recognize all the tokens: while reading "fool" the lexer will find one FOOL token and not FOO ID or something else.
推荐答案
以下逻辑适用:
- 词法分析器匹配尽可能多的字符
- 如果应用规则1后,有2个或更多规则匹配相同数量的字符,则首先定义的规则将获胜"
考虑到这一点,输入"1"
,"2"
,...,"6"
被标记为INT_STAGE
:INT_STAGE
和INT
都匹配相同数量的字符,但是<首先定义c3>.
Taking this into account, the input "1"
, "2"
, ..., "6"
is tokenized as an INT_STAGE
: both INT_STAGE
and INT
match the same amount of characters, but INT_STAGE
is defined first.
由于输入"12"
与大多数字符匹配,因此被标记为INT
.
The input "12"
is tokenized as a INT
since it matches the most characters.
有人告诉我,这是识别所有令牌的正确"命令:在阅读傻瓜"时,词法分析器将找到一个FOOL令牌,而不是FOO ID或其他.
I was told that this is the "right" order to recognize all the tokens: while reading "fool" the lexer will find one FOOL token and not FOO ID or something else.
是的.
这篇关于具有冲突令牌的ANTLR行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!