ANTLR 如何决定应用哪个词法分析器规则?最长匹配的词法分析器规则获胜? [英] How does ANTLR decide which lexer rule to apply? The longest matching lexer rule wins?

查看:51
本文介绍了ANTLR 如何决定应用哪个词法分析器规则?最长匹配的词法分析器规则获胜?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

输入内容:

语法:

grammar test;

p : EOF;

Char : [a-z];

fragment Tab : '\t';
fragment Space : ' ';
T1 : (Tab|Space)+ ->skip;

T2 : '#' T1+ Char+;

匹配结果是这样的:

[@0,0:6='#   abc',<T2>,1:0]    <<<<<<<< PLACE 1
[@1,7:6='<EOF>',<EOF>,1:7]
line 1:0 extraneous input '#   abc' expecting <EOF>

请忽略最后一行的错误.我想知道为什么在 PLACE 1 匹配的令牌是 T2.

Please ignore the error in the last line. I am wondering why the token matched at PLACE 1 is T2.

在语法文件中,T2 词法分析器规则T1 词法分析器规则之后.所以我希望 T1 规则应该首先应用.那么为什么#abc中的空格没有被跳过?

In the grammar file, the T2 lexer rule goes after the T1 lexer rule. So I expect T1 rule should get applied first. So why the spaces in # abc is not skipped?

ANTLR 是否使用某种贪婪策略来匹配当前字符流与最长词法分析器规则?

Does ANTLR uses some greedy strategy to match current character stream with the longest lexer rule?

推荐答案

三个规则适用,顺序如下:

Three rules apply, in this order:

  1. 最长的比赛先获胜.
  2. 接下来规则匹配隐式标记(如语法中的 #).
  3. 最后,如果出现平局(按匹配长度),则匹配规则中最早列出的规则获胜.

经过长时间的搜索,我再次在 Sam Harwell 的一篇冗长引述中找到了这些材料的大部分内容,其中他还阐述了贪婪运算符的影响.我记得我第一次看到它并在我的 TDAR 副本中草拟了笔记,但没有参考.

After much wee-hours searching, I found again most of this material in one lengthy quote from Sam Harwell in which he also expounds on the impact of greedy operators. I remember seeing it the first time and sketching the notes in my copy of TDAR, but without the reference.

ANTLR 4 词法分析器通常以最长匹配获胜行为运行,不考虑替代项在语法中出现的顺序.如果两个词法分析器规则匹配相同的最长输入序列,则只有比较这些规则的相对顺序才能确定如何分配标记类型.

ANTLR 4 lexers normally operate with longest-match-wins behavior, without any regard for the order in which alternatives appear in the grammar. If two lexer rules match the same longest input sequence, only then is the relative order of those rules compared to determine how the token type is assigned.

一旦词法分析器到达非贪婪的可选或闭包,规则中的行为就会改变.从那一刻开始到规则结束,该规则内的所有替代方案都将被视为有序,具有最低替代方案的路径获胜.由于我们在底层 ATN 表示中订购替代品的方式,这种看似奇怪的行为实际上是非贪婪处理的原因.当词法分析器处于此模式并到达块 (ESC|.) 时,排序约束要求它尽可能使用 ESC.

The behavior within a rule changes as soon as the lexer reaches a non-greedy optional or closure. From that moment forward to the end of the rule, all alternatives within that rule will be treated as ordered, and the path with the lowest alternative wins. This seemingly strange behavior is actually responsible for the non-greedy handling due to the way we order alternatives in the underlying ATN representation. When the lexer is in this mode and reaches the block (ESC|.), the ordering constraint requires it use ESC if possible.

隐式令牌"规则来自这里.

这篇关于ANTLR 如何决定应用哪个词法分析器规则?最长匹配的词法分析器规则获胜?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆