具有冲突令牌的ANTLR行为 [英] ANTLR behaviour with conflicting tokens

查看:91
本文介绍了具有冲突令牌的ANTLR行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在令牌冲突的情况下,如何定义ANTLR词法分析器的行为? 让我解释一下冲突"令牌的含义. 例如,假定定义了以下内容:

How is ANTLR lexer behavior defined in the case of conflicting tokens? Let me explain what I mean by "conflicting" tokens. For example, assume that the following is defined:

INT_STAGE       :   '1'..'6';
INT             :   '0'..'9'+;

此处存在冲突,因为在读取了一系列数字后,词法分析器将不知道是一个INT还是多个INT_STAGE令牌(或两者的不同组合). 经过测试,看起来如果在INT_STAGE之后定义INT,则词法分析器希望查找INT_STAGE,但也许不是INT?否则,将找不到INT_STAGE.

There is a conflict here, because after reading a sequence of digits, the lexer would not know whether there is one INT or many INT_STAGE tokens (or different combinations of both). After a test, it looks like that if INT is defined after INT_STAGE, the lexer would prefer to find INT_STAGE, but maybe not INT then? Otherwise, no INT_STAGE would ever be found.

另一个例子是:

FOOL: ' fool'
FOO: 'foo'
ID              :   ('a'..'z'|'A'..'Z'|'_'|'%') ('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'%')*;

有人告诉我,这是识别所有令牌的正确"命令: 在阅读傻瓜"时,词法分析器将找到一个FOOL令牌,而不是FOO ID或其他.

I was told that this is the "right" order to recognize all the tokens: while reading "fool" the lexer will find one FOOL token and not FOO ID or something else.

推荐答案

以下逻辑适用:

  1. 词法分析器匹配尽可能多的字符
  2. 如果应用规则1后,有2个或更多规则匹配相同数量的字符,则首先定义的规则将获胜"

考虑到这一点,输入"1""2",...,"6"被标记为INT_STAGE:INT_STAGEINT都匹配相同数量的字符,但是<首先定义c3>.

Taking this into account, the input "1", "2", ..., "6" is tokenized as an INT_STAGE: both INT_STAGE and INT match the same amount of characters, but INT_STAGE is defined first.

由于输入"12"与大多数字符匹配,因此被标记为INT.

The input "12" is tokenized as a INT since it matches the most characters.

有人告诉我,这是识别所有令牌的正确"命令:在阅读傻瓜"时,词法分析器将找到一个FOOL令牌,而不是FOO ID或其他.

I was told that this is the "right" order to recognize all the tokens: while reading "fool" the lexer will find one FOOL token and not FOO ID or something else.

是的.

这篇关于具有冲突令牌的ANTLR行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆