ANTLR 4.5-输入"x"不匹配,期望"x" [英] ANTLR 4.5 - Mismatched Input 'x' expecting 'x'
问题描述
我已经开始使用ANTLR,并且注意到它的词法分析器规则非常善变.以下是一个非常令人沮丧的示例:
I have been starting to use ANTLR and have noticed that it is pretty fickle with its lexer rules. An extremely frustrating example is the following:
grammar output;
test: FILEPATH NEWLINE TITLE ;
FILEPATH: ('A'..'Z'|'a'..'z'|'0'..'9'|':'|'\\'|'/'|' '|'-'|'_'|'.')+ ;
NEWLINE: '\r'? '\n' ;
TITLE: ('A'..'Z'|'a'..'z'|' ')+ ;
此语法将与以下内容不符:
This grammar will not match something like:
c:\ test.txt
x
c:\test.txt
x
奇怪的是,如果我将TITLE
更改为TITLE: 'x' ;
,则这次仍然失败,并给出一条错误消息,提示输入'x'不匹配,期望'x'",这非常令人困惑.更奇怪的是,如果我将test
中的TITLE
的用法替换为FILEPATH
,则整个方法都可以正常工作(尽管FILEPATH
的匹配项比我想要匹配的要多,所以对于我来说,这通常不是有效的解决方案).
Oddly if I change TITLE
to be TITLE: 'x' ;
it still fails this time giving an error message saying "mismatched input 'x' expecting 'x'" which is highly confusing. Even more oddly if I replace the usage of TITLE
in test
with FILEPATH
the whole thing works (although FILEPATH
will match more than I am looking to match so in general it isn't a valid solution for me).
我对ANTLR为什么会给出如此极其奇怪的错误,然后在随机整理东西时突然工作而没有明显原因感到非常困惑.
I am highly confused as to why ANTLR is giving such extremely strange errors and then suddenly working for no apparent reason when shuffling things around.
推荐答案
这似乎是对ANTLR
的常见误解:
This seems to be a common misunderstanding of ANTLR
:
ANTLR中的语言处理:
语言处理分为两个严格分开的阶段:
The Language Processing is done in two strictly separated phases:
- Lexing,即将文本划分为令牌
- 解析,即根据令牌构建解析树
由于必须在进行词法分析之前先进行分析,因此会产生以下结果:词法分析器独立于解析器,因此解析器无法影响词法分析.
Since lexing must preceed parsing there is a consequence: The lexer is independent of the parser, the parser cannot influence lexing.
Lexing
ANTLR中的词法分析工作如下:
Lexing in ANTLR works as following:
- 所有首字母大写的规则均为词法分析器规则
- 词法分析器从头开始,并尝试查找与当前输入最匹配的规则
- 最佳匹配是具有最大长度的匹配,即,任何词法分析器规则都不匹配将下一个输入字符附加到最大长度匹配中而产生的令牌
- 令牌是根据匹配生成的:
- 如果一个规则与最大长度匹配匹配,则相应的令牌将被推送到令牌流中
- 如果多个规则与最大长度匹配相匹配,则语法中第一个定义的标记会被推送到标记流中
- all rules with uppercase first character are lexer rules
- the lexer starts at the beginning and tries to find a rule that matches best to the current input
- a best match is a match that has maximum length, i.e. the token that results from appending the next input character to the maximum length match is not matched by any lexer rule
- tokens are generated from matches:
- if one rule matches the maximum length match the corresponding token is pushed into the token stream
- if multiple rules match the maximum length match the first defined token in the grammar is pushed to the token stream
示例:您的语法有什么问题
您的语法有两个很重要的规则:
Your grammar has two rules that are critical:
FILEPATH: ('A'..'Z'|'a'..'z'|'0'..'9'|':'|'\\'|'/'|' '|'-'|'_'|'.')+ ; TITLE: ('A'..'Z'|'a'..'z'|' ')+ ;
与TITLE匹配的每个匹配项也将由FILEPATH匹配.并且FILEPATH在TITLE之前定义:因此,您希望成为标题的每个标记都将是FILEPATH.
Each match, that is matched by TITLE will also be matched by FILEPATH. And FILEPATH is defined before TITLE: So each token that you expect to be a title would be a FILEPATH.
有两个提示:
- 使您的词法分析器规则保持分离(没有令牌应与另一个的超集匹配).
- 如果您的令牌有意匹配相同的字符串,则将它们按正确的顺序排列(对于您而言,这就足够了).
- 如果您需要解析器驱动的词法分析器,则必须更改为另一个解析器生成器:PEG-Parsers或GLR-Parsers可以做到这一点(当然,这还会产生其他问题).
这篇关于ANTLR 4.5-输入"x"不匹配,期望"x"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!