ANTLRv4:非贪婪规则 [英] ANTLRv4: non-greedy rules

查看:33
本文介绍了ANTLRv4:非贪婪规则的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读明确的 ANTLR4 参考资料,并且对其中一个示例(第 76 页)有疑问:

I'm reading the definite ANTLR4 reference and have a question regarding one of the examples (p. 76):

STRING: '"' (ESC|.)*? '"';
fragment 
ESC: '\\"' | '\\\\' ;

该规则匹配典型的 C++ 字符串 - "" 中包含的字符序列,它也可以包含 \".

The rule matches a typical C++ string - a char sequence included in "", which can contain \" too.

在我的期望中,由于非贪婪结构,规则 STRING 应该匹配尽可能小的字符串.因此,如果它看到 \" 它将把 \ 映射到 ."" 在规则的末尾,因为这将导致可能的最小字符串.而不是这样,\" 被映射到 ESC.我有一个理解问题,因为它不是我所期望的.

In my expectation, the rule STRING should match the smallest string possible because of the non-greedy construct. So if it sees a \" it would map \ to . and " to " at the end of the rule, since this would result in the smallest string possible. Instead of this, a \" is mapped to ESC. I have an understanding problem, since it is not what I expected.

这里到底发生了什么?是不是这样,一个分离的DFA先匹配(ESC|.),另一个DFA使用(ESC|.)已经匹配的字符串匹配STRING.) 构造?我不得不承认我还没有读完这本书.

What exactly happens here? Is it like this, that a separated DFA matches (ESC|.) first, and another DFA matches STRING using the already matched string of the (ESC|.) construct? I have to admit I haven't read the book to the end.

推荐答案

ANTLR 4 词法分析器通常以最长匹配获胜行为运行,不考虑替代项在语法中出现的顺序.如果两个词法分析器规则匹配相同的最长输入序列,则只有比较这些规则的相对顺序才能确定如何分配标记类型.

ANTLR 4 lexers normally operate with longest-match-wins behavior, without any regard for the order in which alternatives appear in the grammar. If two lexer rules match the same longest input sequence, only then is the relative order of those rules compared to determine how the token type is assigned.

一旦词法分析器到达非贪婪的可选或闭包,规则中的行为就会改变.从那一刻起到规则结束,该规则内的所有备选方案都将被视为有序的,具有最低备选方案的路径获胜.由于 我们在底层 ATN 表示中订购替代品的方式.当词法分析器处于此模式并到达块 (ESC|.) 时,排序约束要求它尽可能使用 ESC.

The behavior within a rule changes as soon as the lexer reaches a non-greedy optional or closure. From that moment forward to the end of the rule, all alternatives within that rule will be treated as ordered, and the path with the lowest alternative wins. This seemingly strange behavior is actually responsible for the non-greedy handling due to the way we order alternatives in the underlying ATN representation. When the lexer is in this mode and reaches the block (ESC|.), the ordering constraint requires it use ESC if possible.

这篇关于ANTLRv4:非贪婪规则的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆