ANTLR 4.5 - 不匹配的输入 'x' 期待 'x' [英] ANTLR 4.5 - Mismatched Input 'x' expecting 'x'
问题描述
我已经开始使用 ANTLR 并注意到它的词法分析器规则非常善变.一个非常令人沮丧的例子如下:
I have been starting to use ANTLR and have noticed that it is pretty fickle with its lexer rules. An extremely frustrating example is the following:
grammar output;
test: FILEPATH NEWLINE TITLE ;
FILEPATH: ('A'..'Z'|'a'..'z'|'0'..'9'|':'|'\\'|'/'|' '|'-'|'_'|'.')+ ;
NEWLINE: '\r'? '\n' ;
TITLE: ('A'..'Z'|'a'..'z'|' ')+ ;
此语法将不匹配以下内容:
This grammar will not match something like:
c:\test.txt
c:\test.txt
x
奇怪的是,如果我将 TITLE
更改为 TITLE: 'x' ;
这一次它仍然失败并给出错误消息说不匹配的输入 'x' 期待 'x'"这是非常令人困惑的.更奇怪的是,如果我将 test
中的 TITLE
的用法替换为 FILEPATH
,整个事情都会起作用(尽管 FILEPATH
会匹配比我希望匹配的更多,所以总的来说它对我来说不是一个有效的解决方案).
Oddly if I change TITLE
to be TITLE: 'x' ;
it still fails this time giving an error message saying "mismatched input 'x' expecting 'x'" which is highly confusing. Even more oddly if I replace the usage of TITLE
in test
with FILEPATH
the whole thing works (although FILEPATH
will match more than I am looking to match so in general it isn't a valid solution for me).
我很困惑为什么 ANTLR 会给出如此极端奇怪的错误,然后在调整周围的东西时突然无缘无故地工作.
I am highly confused as to why ANTLR is giving such extremely strange errors and then suddenly working for no apparent reason when shuffling things around.
推荐答案
这似乎是对ANTLR
的普遍误解:
This seems to be a common misunderstanding of ANTLR
:
ANTLR 中的语言处理:
语言处理分为两个严格分开的阶段:
The Language Processing is done in two strictly separated phases:
- 词法分析,即将文本划分为标记
- 解析,即从标记构建解析树
因为词法分析必须在解析之前,所以有一个结果:词法分析器独立于解析器,解析器不能影响词法分析.
Since lexing must preceed parsing there is a consequence: The lexer is independent of the parser, the parser cannot influence lexing.
Lexing
ANTLR 中的词法分析如下:
Lexing in ANTLR works as following:
- 所有首字母大写的规则都是词法规则
- 词法分析器从头开始并尝试找到与当前输入最匹配的规则
- 最佳匹配是具有最大长度的匹配,即将下一个输入字符附加到最大长度匹配所产生的标记与任何词法分析器规则都不匹配
- 令牌是从匹配项中生成的:
- 如果一个规则与最大长度匹配匹配,则相应的令牌被推入令牌流
- 如果多个规则匹配最大长度匹配,则语法中定义的第一个标记被推送到标记流
- all rules with uppercase first character are lexer rules
- the lexer starts at the beginning and tries to find a rule that matches best to the current input
- a best match is a match that has maximum length, i.e. the token that results from appending the next input character to the maximum length match is not matched by any lexer rule
- tokens are generated from matches:
- if one rule matches the maximum length match the corresponding token is pushed into the token stream
- if multiple rules match the maximum length match the first defined token in the grammar is pushed to the token stream
示例:您的语法有什么问题
您的语法有两条至关重要的规则:
Your grammar has two rules that are critical:
FILEPATH: ('A'..'Z'|'a'..'z'|'0'..'9'|':'|'\\'|'/'|' '|'-'|'_'|'.')+ ; TITLE: ('A'..'Z'|'a'..'z'|' ')+ ;
由 TITLE 匹配的每个匹配项也将被 FILEPATH 匹配.并且 FILEPATH 在 TITLE 之前定义:因此,您希望作为标题的每个标记都是一个 FILEPATH.
Each match, that is matched by TITLE will also be matched by FILEPATH. And FILEPATH is defined before TITLE: So each token that you expect to be a title would be a FILEPATH.
有两个提示:
- 保持您的词法分析器规则分离(任何标记都不应与另一个标记的超集匹配).
- 如果您的令牌有意匹配相同的字符串,则将它们按正确的顺序排列(在您的情况下,这就足够了).
- 如果你需要一个解析器驱动的词法分析器,你必须改用另一个解析器生成器:PEG-Parsers 或 GLR-Parsers 会这样做(当然这会产生其他问题).
这篇关于ANTLR 4.5 - 不匹配的输入 'x' 期待 'x'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!