Antlr 规则优先级 [英] Antlr rule priorities
问题描述
首先我知道这个语法没有意义,但它是为了测试 ANTLR 规则优先级行为而创建的
Firstly I know this grammar doesn't make sense but it was created to test out the ANTLR rule priority behaviour
grammar test;
options
{
output=AST;
backtrack=true;
memoize=true;
}
rule_list_in_order :
(
first_rule
| second_rule
| any_left_over_tokens)+
;
first_rule
:
FIRST_TOKEN
;
second_rule:
FIRST_TOKEN NEW_LINE SECOND_TOKEN NEW_LINE;
any_left_over_tokens
:
NEW_LINE
| FIRST_TOKEN
| SECOND_TOKEN;
FIRST_TOKEN
: 'First token here'
;
SECOND_TOKEN
: 'Second token here';
NEW_LINE
: ('\r'?'\n') ;
WS : (' '|'\t'|'\u000C')
{$channel=HIDDEN;}
;
当我给这个语法输入这里的第一个标记\n这里的第二个标记"时,它匹配 second_rule.
When I give this grammar the input 'First token here\nSecond token here', it matches the second_rule.
我本来希望它匹配第一条规则,然后是 any_left_over_tokens,因为 first_rule 出现在作为起点的 rule_order_list 中的 second_rule 之前.谁能解释为什么会发生这种情况?
I would have expected it to match the first rule then any_left_over_tokens because the first_rule appears before the second_rule in the rule_order_list which is the start point. Can anyone explain why this happens?
干杯
推荐答案
首先,ANTLR 的词法分析器会从上到下标记输入.因此,首先定义的标记比其下面的标记具有更高的优先级.如果规则有重叠的标记,匹配最多字符的规则将优先(贪婪匹配).
First of all, ANTLR's lexer will tokenize the input from top to bottom. So tokens defined first have a higher precedence than the ones below it. And in case rule have overlapping tokens, the rule that matches the most characters will take precedence (greedy match).
同样的原则适用于解析器规则.首先定义的规则也将首先匹配.例如,在规则 foo
中,子规则 a
将在 b
之前先被尝试:
The same principle holds within parser rules. Rules defined first will also be matched first. For example, in rule foo
, sub-rule a
will first be tried before b
:
foo
: a
| b
;
请注意,在您的情况下, 2nd 规则不匹配,但尝试这样做,但由于没有尾随换行符而失败,从而产生错误:
Note that in your case, the 2nd rule isn't matched, but tries to do so, and fails because there is no trailing line break, producing the error:
line 0:-1 mismatched input '<EOF>' expecting NEW_LINE
因此,根本没有匹配的内容.但是那很奇怪.因为你已经设置了 backtrack=true
,它至少应该回溯和匹配:
So, nothing is matched at all. But that is odd. Because you've set the backtrack=true
, it should at least backtrack and match:
first_rule
("这里的第一个令牌")any_left_over_tokens
("line-break")any_left_over_tokens
(这里的第二个令牌")
如果首先不匹配 first_rule
并且甚至不尝试匹配 second_rule
开始.
if not match first_rule
in the first place and not even try to match second_rule
to begin with.
手动执行谓词(并在 options { ... } 部分禁用 backtrack
)时的快速演示如下所示:
A quick demo when doing the predicates manually (and disabling the backtrack
in the options { ... } section) would look like:
grammar T;
options {
output=AST;
//backtrack=true;
memoize=true;
}
rule_list_in_order
: ( (first_rule)=> first_rule {System.out.println("first_rule=[" + $first_rule.text + "]");}
| (second_rule)=> second_rule {System.out.println("second_rule=[" + $second_rule.text + "]");}
| any_left_over_tokens {System.out.println("any_left_over_tokens=[" + $any_left_over_tokens.text + "]");}
)+
;
first_rule
: FIRST_TOKEN
;
second_rule
: FIRST_TOKEN NEW_LINE SECOND_TOKEN NEW_LINE
;
any_left_over_tokens
: NEW_LINE
| FIRST_TOKEN
| SECOND_TOKEN
;
FIRST_TOKEN : 'First token here';
SECOND_TOKEN : 'Second token here';
NEW_LINE : ('\r'?'\n');
WS : (' '|'\t'|'\u000C') {$channel=HIDDEN;};
可以用类进行测试:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String source = "First token here\nSecond token here";
ANTLRStringStream in = new ANTLRStringStream(source);
TLexer lexer = new TLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
TParser parser = new TParser(tokens);
parser.rule_list_in_order();
}
}
产生预期输出:
first_rule=[First token here]
any_left_over_tokens=[
]
any_left_over_tokens=[Second token here]
注意,如果你使用没有关系:
Note that it doesn't matter if you use:
rule_list_in_order
: ( (first_rule)=> first_rule
| (second_rule)=> second_rule
| any_left_over_tokens
)+
;
或
rule_list_in_order
: ( (second_rule)=> second_rule // <--+--- swapped
| (first_rule)=> first_rule // <-/
| any_left_over_tokens
)+
;
,两者都会产生预期的输出.
, both will produce the expected output.
所以,我猜你可能发现了一个错误.
So, my guess is that you may have found a bug.
你可以试试 ANTLR 邮件列表,以防你想要一个明确的答案(Terence Parr 经常去那里的次数比他来的多).
Yout could try the ANTLR mailing-list, in case you want a definitive answer (Terence Parr frequents there more often than he does here).
祝你好运!
附注.我用 ANTLR v3.2 测试了这个
PS. I tested this with ANTLR v3.2
这篇关于Antlr 规则优先级的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!