ANTLR词法分析器规则消耗过多 [英] ANTLR lexer rule consumes too much
问题描述
ANTLR Lexer规则设计
ANTLR Lexer Rule Design
我需要以下令牌:
- 允许的字符包括大写,小写,数字,空格和连字符
- 不固定长度(必须至少两个字符)
- 令牌必须至少包含一个空格或连字符
- 令牌必须以大写,小写,数字,空格或连字符开头和结尾(不能以空格开头或结尾)
以下语法中的ANTLR词法分析器规则"AlphaNumericSpaceHyphen"几乎有效,除了一种情况.使用解析器规则"sic"进行测试,以下输入将被解析(不带引号):
The ANTLR lexer rule "AlphaNumericSpaceHyphen" in the grammar below almost works except for one case. Using the parser rule "sic" to test, the following input will parse (without quotes):
标准工业分类:水运输[4400]"
"STANDARD INDUSTRIAL CLASSIFICATION: WATER TRANSPORTATION[4400]"
以下输入无法解析(不带引号):
The following input fails to parse (without quotes):
标准工业分类:水运输[4400]"
"STANDARD INDUSTRIAL CLASSIFICATION: WATER TRANSPORTATION [4400]"
问题是词法分析器规则"AlphaNumericSpaceHyphen"在词法分析器意识到没有匹配项是因为它走得太远之前,在"WATER TRANSPORTATION"之后占用了空间和左方括号.
The issue being that the lexer rule "AlphaNumericSpaceHyphen" consumes the space and the left square bracket after "WATER TRANSPORTATION" before the lexer realizes that there is no match because it went too far.
我已经尝试过各种类型的谓词,并且在没有任何运气的情况下向前看.任何帮助将不胜感激.
I have experimented with various type of predicates and look aheads without any luck. Any help is greatly appreciated.
grammar T;
sic: SICSpecifier AlphaNumericSpaceHyphen LEFTBRACKET Digits RIGHTBRACKET;
LEFTBRACKET
: '[';
RIGHTBRACKET
: ']';
SICSpecifier: 'STANDARD INDUSTRIAL CLASSIFICATION:';
WS : (' '|'\t')+
{
$channel = HIDDEN;
};
fragment UCASEALPHA : 'A'..'Z';
fragment LCASEALPHA : 'a'..'z';
fragment DIGIT : '0'..'9';
Digits: DIGIT+;
AlphaNumericSpaceHyphen
: (UCASEALPHA|LCASEALPHA |DIGIT|'-')+ (' ' (UCASEALPHA|LCASEALPHA |DIGIT|'-')+)+
| (UCASEALPHA|LCASEALPHA |DIGIT)+ ('-')+ ((' '|UCASEALPHA|LCASEALPHA |DIGIT|'-')* (UCASEALPHA|LCASEALPHA |DIGIT|'-'))?
| ('-')+ (UCASEALPHA|LCASEALPHA |DIGIT)+ ((UCASEALPHA|LCASEALPHA |DIGIT|'-'|' ')* (UCASEALPHA|LCASEALPHA |DIGIT|'-'))?
;
推荐答案
不幸的是,词法分析器规则没有回溯.您可以看一下
Unfortunately there is no backtracking for the lexer rules. You can take a look at
您可以尝试调整语法,以便按照此解决方案中的建议更改令牌的类型.
You can try to adapt your grammar so that you can change the type of the token as it is suggested in this solution.
希望这会对您有所帮助.
Hope this is going to help you.
这篇关于ANTLR词法分析器规则消耗过多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!