具有冲突令牌的 ANTLR 行为 [英] ANTLR behaviour with conflicting tokens
问题描述
在令牌冲突的情况下,ANTLR 词法分析器行为是如何定义的?让我解释一下我所说的冲突"令牌是什么意思.例如,假设定义了以下内容:
How is ANTLR lexer behavior defined in the case of conflicting tokens? Let me explain what I mean by "conflicting" tokens. For example, assume that the following is defined:
INT_STAGE : '1'..'6';
INT : '0'..'9'+;
这里有一个冲突,因为在读取了一系列数字后,词法分析器不知道是一个 INT 还是多个 INT_STAGE 标记(或两者的不同组合).经过测试,看起来如果 INT 在 INT_STAGE 之后定义,词法分析器更愿意找到 INT_STAGE,但也许不是 INT 呢?否则,将永远找不到 INT_STAGE.
There is a conflict here, because after reading a sequence of digits, the lexer would not know whether there is one INT or many INT_STAGE tokens (or different combinations of both). After a test, it looks like that if INT is defined after INT_STAGE, the lexer would prefer to find INT_STAGE, but maybe not INT then? Otherwise, no INT_STAGE would ever be found.
另一个例子是:
FOOL: ' fool'
FOO: 'foo'
ID : ('a'..'z'|'A'..'Z'|'_'|'%') ('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'%')*;
有人告诉我这是识别所有令牌的正确"顺序:在阅读傻瓜"时,词法分析器会找到一个 FOOL 令牌,而不是 FOO ID 或其他东西.
I was told that this is the "right" order to recognize all the tokens: while reading "fool" the lexer will find one FOOL token and not FOO ID or something else.
推荐答案
以下逻辑适用:
- 词法分析器匹配尽可能多的字符
- 如果在应用规则 1 后,有 2 个或更多规则匹配相同数量的字符,则首先定义的规则将获胜"
考虑到这一点,输入 "1"
, "2"
, ..., "6"
被标记为INT_STAGE
:INT_STAGE
和 INT
匹配相同数量的字符,但 INT_STAGE
先定义.
Taking this into account, the input "1"
, "2"
, ..., "6"
is tokenized as an INT_STAGE
: both INT_STAGE
and INT
match the same amount of characters, but INT_STAGE
is defined first.
输入 "12"
被标记为 INT
,因为它匹配最多的字符.
The input "12"
is tokenized as a INT
since it matches the most characters.
有人告诉我,这是识别所有标记的正确"顺序:在读取傻瓜"时,词法分析器会找到一个 FOOL 标记,而不是 FOO ID 或其他东西.
I was told that this is the "right" order to recognize all the tokens: while reading "fool" the lexer will find one FOOL token and not FOO ID or something else.
没错.
这篇关于具有冲突令牌的 ANTLR 行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!