ANTLR如何确定要应用哪个词法分析器规则?最长的词法匹配规则获胜? [英] How does ANTLR decide which lexer rule to apply? The longest matching lexer rule wins?

查看:273
本文介绍了ANTLR如何确定要应用哪个词法分析器规则?最长的词法匹配规则获胜?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

输入内容:

语法:

grammar test;

p : EOF;

Char : [a-z];

fragment Tab : '\t';
fragment Space : ' ';
T1 : (Tab|Space)+ ->skip;

T2 : '#' T1+ Char+;

匹配的结果是这样的:

[@0,0:6='#   abc',<T2>,1:0]    <<<<<<<< PLACE 1
[@1,7:6='<EOF>',<EOF>,1:7]
line 1:0 extraneous input '#   abc' expecting <EOF>

请忽略最后一行中的错误.我想知道为什么在 PLACE 1 处匹配的令牌是T2.

Please ignore the error in the last line. I am wondering why the token matched at PLACE 1 is T2.

在语法文件中,T2词法分析器规则在T1词法分析器规则之后.因此,我希望应该首先应用T1规则.那么为什么不跳过# abc中的空格?

In the grammar file, the T2 lexer rule goes after the T1 lexer rule. So I expect T1 rule should get applied first. So why the spaces in # abc is not skipped?

ANTLR是否使用某种贪婪策略将当前字符流与最长的词法器规则匹配?

Does ANTLR uses some greedy strategy to match current character stream with the longest lexer rule?

推荐答案

以下三个规则适用:

  1. 最长的比赛获胜.
  2. 规则匹配隐式标记(例如语法中的#).
  3. 最后,如果出现平局(按比赛长度),则在匹配规则中最早列出的规则将获胜.
  1. Longest match wins first.
  2. Rule matching implicit token (like # in your grammar) next.
  3. Finally, in case of a tie (by match length), the rule listed earliest among the matching rules wins.

经过数小时的搜索,我再次从Sam Harwell的一长篇引文中找到了大部分内容,他还阐述了贪婪的操作者的影响.我记得第一次见到它,并在我的TDAR副本中草绘了笔记,但没有参考.

After much wee-hours searching, I found again most of this material in one lengthy quote from Sam Harwell in which he also expounds on the impact of greedy operators. I remember seeing it the first time and sketching the notes in my copy of TDAR, but without the reference.

ANTLR 4词法分析器通常以最长匹配双赢的方式运行,而无需考虑语法中替代词的出现顺序.如果两个词法器规则匹配相同的最长输入序列,则只有将这两个规则的相对顺序进行比较,才能确定令牌类型的分配方式.

ANTLR 4 lexers normally operate with longest-match-wins behavior, without any regard for the order in which alternatives appear in the grammar. If two lexer rules match the same longest input sequence, only then is the relative order of those rules compared to determine how the token type is assigned.

一旦词法分析器到达非贪婪的可选内容或闭包,规则内的行为就会发生变化.从那一刻到规则结束,该规则内的所有替代项将被视作有序处理,并且具有最低替代项的路径将获胜.由于我们在基础ATN表示中订购替代项的方式,这种看似奇怪的行为实际上是造成非贪婪处理的原因.当词法分析器处于此模式并到达块(ESC |.)时,排序约束要求它尽可能使用ESC.

The behavior within a rule changes as soon as the lexer reaches a non-greedy optional or closure. From that moment forward to the end of the rule, all alternatives within that rule will be treated as ordered, and the path with the lowest alternative wins. This seemingly strange behavior is actually responsible for the non-greedy handling due to the way we order alternatives in the underlying ATN representation. When the lexer is in this mode and reaches the block (ESC|.), the ordering constraint requires it use ESC if possible.

隐式令牌"规则来自此处.

这篇关于ANTLR如何确定要应用哪个词法分析器规则?最长的词法匹配规则获胜?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆