Antlrworks-无关输入 [英] Antlrworks - extraneous input
问题描述
我在这方面是新手,因此,我需要您的帮助. 我正在尝试解析Wikipedia转储,而第一步是将它们定义的每个规则映射到ANTLR,不幸的是,我遇到了第一个障碍:
I am new in this stuff, and for that reason I will need your help.. I am trying to parse the Wikipedia Dump, and my first step is to map each rule defined by them into ANTLR, unfortunally I got my first barrier:
第1:8行多余的输入'''''期望'\'\''
line 1:8 extraneous input ''''' expecting '\'\''
我不了解发生了什么,请帮忙.
I am not understanding what is going on, please lend me your help.
我的代码:
grammar Test;
options {
language = Java;
}
parse
: term+ EOF
;
term
: IDENT
| '[[' term ']]'
| '\'\'' term '\'\''
| '\'\'\'' term '\'\'\''
;
IDENT
: ('a'..'z' | 'A'..'Z' | '0'..'9' | '=' | '#' | '"' | ' ')*
;
输入 '''''Hello World'''''
Input '''''Hello World'''''
推荐答案
词法分析器规则必须始终至少匹配1个字符.您的规则:
A lexer rule must always match at least 1 character. Your rule:
IDENT : ('a'..'z' | 'A'..'Z' | '0'..'9' | '=' | '#' | '"' | ' ')*;
匹配一个空字符串(其中无穷多个).将*
更改为+
:
matches an empty string (of which there are an infinite amount of). Change the *
to a +
:
IDENT : ('a'..'z' | 'A'..'Z' | '0'..'9' | '=' | '#' | '"' | ' ')+;
编辑
输入
'''''Hello World'''''
尽管您将文字标记放在解析器规则('\'\'\''
,'\'\''
等)中,但您必须了解,它们不是在解析器的要求下创建的.词法分析器遵循严格的规则来创建令牌:
Although you put literal tokens inside parser rules ('\'\'\''
, '\'\''
, etc.), you must understand that they are not created at the behest of the parser. The lexer follows strict rules to create tokens:
- 它尝试尽可能匹配
- 如果2个不同的词法分析器规则匹配相同数量的字符,则首先定义的一个将获得优先级
让我们给您的文字标记一个名称:
Let's give your literal tokens a name:
BRACKET_OPEN : '[[';
BRACKET_CLOSE : ']]';
Q3 : '\'\'\'';
Q2 : '\'\'';
IDENT : ('a'..'z' | 'A'..'Z' | '0'..'9' | '=' | '#' | '"' | ' ')+;
现在,由于规则#1(尽可能匹配),输入'''''Hello World'''''
将被标记如下:
Now, because of rule #1 (match as much as possible), the input '''''Hello World'''''
will be tokenized as follows:
-
Q3
-
Q2
-
IDENT
-
Q3
(是的,Q3
!) -
Q2
Q3
Q2
IDENT
Q3
(yes, aQ3
!)Q2
但是您的解析器规则term
将仅接受Q3 Q2 IDENT Q2 Q3
,因此正确地解析您的输入是正确的.
But your parser rule term
will only accept Q3 Q2 IDENT Q2 Q3
, so it is correct that your input fails to parse properly.
此外,我建议您不要使用解释器:这很容易出错.调试器的工作原理却很吸引人!
Also, I recommend you not use the interpreter: it's rather buggy. The debugger works like a charm though!
这篇关于Antlrworks-无关输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!