ANTLR 词法分析器规则消耗过多 [英] ANTLR lexer rule consumes too much

查看:41
本文介绍了ANTLR 词法分析器规则消耗过多的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

ANTLR 词法分析器规则设计

ANTLR Lexer Rule Design

我需要以下令牌:

  • 允许的字符包括大写、小写、数字、空格和连字符
  • 不定长(长度必须至少为两个字符)
  • 令牌必须至少包含一个空格或连字符
  • 标记必须以大写、小写、数字、空格或连字符开头和结尾(不能以空格开头或结尾)

下面语法中的 ANTLR 词法分析器规则AlphaNumericSpaceHyphen"几乎可以正常工作,除了一种情况.使用解析器规则sic"进行测试,将解析以下输入(不带引号):

The ANTLR lexer rule "AlphaNumericSpaceHyphen" in the grammar below almost works except for one case. Using the parser rule "sic" to test, the following input will parse (without quotes):

标准工业分类:水运[4400]"

"STANDARD INDUSTRIAL CLASSIFICATION: WATER TRANSPORTATION[4400]"

以下输入无法解析(没有引号):

The following input fails to parse (without quotes):

标准工业分类:水运[4400]"

"STANDARD INDUSTRIAL CLASSIFICATION: WATER TRANSPORTATION [4400]"

问题是词法分析器规则AlphaNumericSpaceHyphen"在词法分析器意识到没有匹配之前消耗了WATER TRANSPORTATION"之后的空格和左方括号,因为它走得太远了.

The issue being that the lexer rule "AlphaNumericSpaceHyphen" consumes the space and the left square bracket after "WATER TRANSPORTATION" before the lexer realizes that there is no match because it went too far.

我尝试了各种类型的谓词,但没有任何运气.非常感谢任何帮助.

I have experimented with various type of predicates and look aheads without any luck. Any help is greatly appreciated.

grammar T;

sic: SICSpecifier AlphaNumericSpaceHyphen  LEFTBRACKET Digits RIGHTBRACKET;

LEFTBRACKET  
:   '[';  

RIGHTBRACKET 
:   ']';

SICSpecifier: 'STANDARD INDUSTRIAL CLASSIFICATION:';

WS : (' '|'\t')+ 
{   
  $channel = HIDDEN;  
};  

fragment UCASEALPHA : 'A'..'Z';
fragment LCASEALPHA : 'a'..'z';
fragment DIGIT : '0'..'9';
Digits: DIGIT+;

AlphaNumericSpaceHyphen 
:           (UCASEALPHA|LCASEALPHA |DIGIT|'-')+  (' ' (UCASEALPHA|LCASEALPHA |DIGIT|'-')+)+   
        |   (UCASEALPHA|LCASEALPHA |DIGIT)+ ('-')+  ((' '|UCASEALPHA|LCASEALPHA |DIGIT|'-')* (UCASEALPHA|LCASEALPHA |DIGIT|'-'))?
        |   ('-')+ (UCASEALPHA|LCASEALPHA |DIGIT)+  ((UCASEALPHA|LCASEALPHA |DIGIT|'-'|' ')* (UCASEALPHA|LCASEALPHA |DIGIT|'-'))?   
        ;

推荐答案

不幸的是,词法分析器规则没有回溯.你可以看看

Unfortunately there is no backtracking for the lexer rules. You can take a look at

ANTLR 词法分析器规则即使不匹配也会消耗字符?

您可以尝试调整您的语法,以便您可以按照本解决方案中的建议更改标记的类型.

You can try to adapt your grammar so that you can change the type of the token as it is suggested in this solution.

希望对你有所帮助.

这篇关于ANTLR 词法分析器规则消耗过多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆