ANTLR词法分析器规则消耗过多 [英] ANTLR lexer rule consumes too much

查看:102
本文介绍了ANTLR词法分析器规则消耗过多的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

ANTLR Lexer规则设计

ANTLR Lexer Rule Design

我需要以下令牌:

  • 允许的字符包括大写,小写,数字,空格和连字符
  • 不固定长度(必须至少两个字符)
  • 令牌必须至少包含一个空格或连字符
  • 令牌必须以大写,小写,数字,空格或连字符开头和结尾(不能以空格开头或结尾)

以下语法中的ANTLR词法分析器规则"AlphaNumericSpaceHyphen"几乎有效,除了一种情况.使用解析器规则"sic"进行测试,以下输入将被解析(不带引号):

The ANTLR lexer rule "AlphaNumericSpaceHyphen" in the grammar below almost works except for one case. Using the parser rule "sic" to test, the following input will parse (without quotes):

标准工业分类:水运输[4400]"

"STANDARD INDUSTRIAL CLASSIFICATION: WATER TRANSPORTATION[4400]"

以下输入无法解析(不带引号):

The following input fails to parse (without quotes):

标准工业分类:水运输[4400]"

"STANDARD INDUSTRIAL CLASSIFICATION: WATER TRANSPORTATION [4400]"

问题是词法分析器规则"AlphaNumericSpaceHyphen"在词法分析器意识到没有匹配项是因为它走得太远之前,在"WATER TRANSPORTATION"之后占用了空间和左方括号.

The issue being that the lexer rule "AlphaNumericSpaceHyphen" consumes the space and the left square bracket after "WATER TRANSPORTATION" before the lexer realizes that there is no match because it went too far.

我已经尝试过各种类型的谓词,并且在没有任何运气的情况下向前看.任何帮助将不胜感激.

I have experimented with various type of predicates and look aheads without any luck. Any help is greatly appreciated.

grammar T;

sic: SICSpecifier AlphaNumericSpaceHyphen  LEFTBRACKET Digits RIGHTBRACKET;

LEFTBRACKET  
:   '[';  

RIGHTBRACKET 
:   ']';

SICSpecifier: 'STANDARD INDUSTRIAL CLASSIFICATION:';

WS : (' '|'\t')+ 
{   
  $channel = HIDDEN;  
};  

fragment UCASEALPHA : 'A'..'Z';
fragment LCASEALPHA : 'a'..'z';
fragment DIGIT : '0'..'9';
Digits: DIGIT+;

AlphaNumericSpaceHyphen 
:           (UCASEALPHA|LCASEALPHA |DIGIT|'-')+  (' ' (UCASEALPHA|LCASEALPHA |DIGIT|'-')+)+   
        |   (UCASEALPHA|LCASEALPHA |DIGIT)+ ('-')+  ((' '|UCASEALPHA|LCASEALPHA |DIGIT|'-')* (UCASEALPHA|LCASEALPHA |DIGIT|'-'))?
        |   ('-')+ (UCASEALPHA|LCASEALPHA |DIGIT)+  ((UCASEALPHA|LCASEALPHA |DIGIT|'-'|' ')* (UCASEALPHA|LCASEALPHA |DIGIT|'-'))?   
        ;

推荐答案

不幸的是,词法分析器规则没有回溯.您可以看一下

Unfortunately there is no backtracking for the lexer rules. You can take a look at

即使没有匹配,ANTLR lexer规则也会消耗字符?

您可以尝试调整语法,以便按照此解决方案中的建议更改令牌的类型.

You can try to adapt your grammar so that you can change the type of the token as it is suggested in this solution.

希望这会对您有所帮助.

Hope this is going to help you.

这篇关于ANTLR词法分析器规则消耗过多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆