ANTLR 语法也能识别数字键和整数 [英] ANTLR grammar to recognize DIGIT keys and INTEGERS too
问题描述
我正在尝试创建一个 ANTLR 语法来解析可选具有重复计数的键序列.例如,(a b c r5)
表示重复键 a、b 和 c 五次".
我的语法适用于 KEYS : ('a'..'z'|'A'..'Z')
.
但是当我尝试添加数字键 KEYS : ('a'..'z'|'A'..'Z'|'0'..'9')
时像 (a 5 r5)
这样的输入表达式,解析在中间 5 处失败,因为它无法判断 5 是整数还是 KEY.(或者我认为;错误消息很难解释NoViableAltException").
我尝试过这些语法形式,它们有效('r' 的意思是repeatcount"):
repeat : '(' LETTERKEYS INTEGER ')' - 适用于 a-zA-Z重复 : '(' LETTERKEYS 'r' 整数 ')';- 适用于 a-zA-Z
但我失败了
repeat : '(' LETTERSandDIGITKEYS INTEGER ')' - 在 '(a 5 r5)' 上失败重复 : '(' LETTERSandDIGITKEYS 'r' 整数 ')';- '(a 5 r5)' 失败
可能是语法做不了识别;也许我需要以相同的方式识别所有 5 的键(如 KEYS 或 DIGITS 或 INTEGERS),并且在解析树中,访问者将中间的 DIGIT 实例解释为键,并将最后一组 DIGITS 解释为 INTEGER 计数?
是否可以定义一种语法,允许我重复数字键和字母键,以便像 (a 5 123 r5)
这样的表达式被正确识别?(也就是说,重复键 a、5、1、2、3 五次.")我不依赖于那个特定的语法,尽管使用类似的东西会很好.
谢谢.
中间的 5 解析失败,因为它无法判断 5 是整数还是键.
如果您定义了以下规则:
整数:[0-9]+;键 : [a-zA-Z0-9];
那么单个数字,例如您示例中的 5
,将始终成为 INTEGER
标记.即使解析器试图匹配 KEY
标记,5
将变成 INTEGER
.空无一物你可以这样做:这就是 ANTLR 的词法分析器的工作方式.词法分析器的工作方式如下:
- 尝试使用尽可能多的字符(最长匹配获胜)
- 如果有 2 个或多个规则匹配相同的字符(例如
INTEGER
和KEY
在5
的情况下),让规则先定义 "赢"
如果您希望 5
成为 INTEGER
,但有时是 KEY
,请执行以下操作:
key : KEY |SINGLE_DIGIT |R;整数:整数 |SINGLE_DIGIT;重复:R 整数;SINGLE_DIGIT : [0-9];整数:[0-9]+;R : 'r';键 : [a-zA-Z];
并且在您的解析器规则中,您使用 key
和 integer
而不是 KEY
和 INTEGER
.>
I'm trying to create an ANTLR grammar to parse sequences of keys that optionally have a repeat count. For example, (a b c r5)
means "repeat keys a, b, and c five times."
I have the grammar working for KEYS : ('a'..'z'|'A'..'Z')
.
But when I try to add digit keys KEYS : ('a'..'z'|'A'..'Z'|'0'..'9')
with an input expression like (a 5 r5)
, the parse fails on the middle 5 because it can't tell if the 5 is an INTEGER or a KEY. (Or so I think; the error messages are difficult to interpret "NoViableAltException").
I have tried these grammatical forms, which work ('r' means "repeatcount"):
repeat : '(' LETTERKEYS INTEGER ')' - works for a-zA-Z
repeat : '(' LETTERKEYS 'r' INTEGER ')'; - works for a-zA-Z
But I fail with
repeat : '(' LETTERSandDIGITKEYS INTEGER ')' - fails on '(a 5 r5)'
repeat : '(' LETTERSandDIGITKEYS 'r' INTEGER ')'; - fails on '(a 5 r5)'
Maybe the grammar can't do the recognition; maybe I need to recognize all the 5's keys in the same way (as KEYS or DIGITS or INTEGERS) and in the parse tree visitor interpret the middle DIGIT instances as keys, and the last set of DIGITS as an INTEGER count?
Is it possible to define a grammar that allows me to repeat digit keys as well as letter keys so that expressions like (a 5 123 r5)
will be recognized correctly? (That is, "repeat keys a,5,1,2,3 five times.") I'm not tied to that specific syntax, although it would be nice to use something similar.
Thank you.
the parse fails on the middle 5 because it can't tell if the 5 is an INTEGER or a KEY.
If you have defined the following rules:
INTEGER : [0-9]+;
KEY : [a-zA-Z0-9];
then a single digit, like 5
in your example, will always become an INTEGER
token. Even if
the parser is trying to match a KEY
token, the 5
will become an INTEGER
. There is nothing
you can do about that: this is the way ANTLR's lexer works. The lexer works in the following way:
- try to consume as many characters as possible (the longest match wins)
- if 2 or more rules match the same characters (like
INTEGER
andKEY
in case of5
), let the rule defined first "win"
If you want a 5
to be an INTEGER
, but sometimes a KEY
, do something like this instead:
key : KEY | SINGLE_DIGIT | R;
integer : INTEGER | SINGLE_DIGIT;
repeat : R integer;
SINGLE_DIGIT : [0-9];
INTEGER : [0-9]+;
R : 'r';
KEY : [a-zA-Z];
and in your parser rules, you use key
and integer
instead of KEY
and INTEGER
.
这篇关于ANTLR 语法也能识别数字键和整数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!