Antlr无关输入 [英] Antlr Extraneous Input

查看:91
本文介绍了Antlr无关输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个语法文件BoardFile.g4,该文件具有(仅相关部分):

grammar Board;

//Tokens
GADGET : 'squareBumper' | 'circleBumper' | 'triangleBumper' | 'leftFlipper' | 'rightFlipper' | 'absorber' | 'portal' ;
NAME : [A-Za-z_][A-Za-z_0-9]* ;
INT : [0-9]+ ;
FLOAT : '-'?[0-9]+('.'[0-9]+)? ;
COMMENT : '#' ~( '\r' | '\n' )*;
WHITESPACE : [ \t\r\n]+ -> skip ;
KEY : [a-z] | [0-9] | 'shift' | 'ctrl' | 'alt' | 'meta' | 'space' | 'left' | 'right' | 'up' | 'down' | 'minus' | 'equals' | 'backspace' | 'openbracket' | 'closebracket' | 'backslash' | 'semicolon' | 'quote' | 'enter' | 'comma' | 'period' | 'slash' ;
KEYPRESS : 'keyup' | 'keydown' ;

//Rules
file : define+ EOF ;
define : board | ball | gadget | fire | COMMENT | key ;
board : 'board' 'name' '=' name ('gravity' '=' gravity)? ('friction1' '=' friction1)? ('friction2' '=' friction2)? ;
ball : 'ball' 'name' '=' name 'x' '=' xfloat 'y' '=' yfloat 'xVelocity' '=' xvel 'yVelocity' '=' yvel ;
gadget : gadgettype 'name' '=' name 'x' '=' xint 'y' '=' yint ('width' '=' width 'height' '=' height)? ('orientation' '=' orientation)? ('otherBoard' '=' name 'otherPortal' '=' name)? ;
fire : 'fire' 'trigger' '=' trigger 'action' '=' action ;
key : keytype 'key' '=' KEY 'action' '=' name ;

name : NAME ;
gadgettype : GADGET ;
keytype : KEYPRESS ;
gravity : FLOAT ;
friction1 : FLOAT ;
friction2 : FLOAT ;
trigger : NAME ;
action : NAME ;
yfloat : FLOAT ;
xfloat : FLOAT ;
yint : INT ;
xint : INT ;
xvel : FLOAT ;
yvel : FLOAT ;
orientation : INT ;
width : INT ;
height : INT ;

这将生成词法分析器和解析器.但是,当我对以下文件使用它时,它会出现以下错误:

line 12:0 extraneous input 'keyup' expecting {<EOF>, KEYPRESS}

要解析的文件:

板名=键板重力= 5.0摩擦力1 = 0.0摩擦力2 = 0.0

# define a ball
ball name=Ball x=0.5 y=0.5 xVelocity=2.5 yVelocity=2.5

# add some flippers
leftFlipper name=FlipL1 x=16 y=2 orientation=0
leftFlipper name=FlipL2 x=16 y=9 orientation=0

# add keys. lots of keys.
keyup key=space action=apple
keydown key=a action=ball
keyup key=backslash action=cat
keydown key=period action=dog

我经历了有关SO中此错误的其他问题,但没有一个帮助我.我无法弄清楚出了什么问题.为什么会出现此错误?

解决方案

字符串"keyup"被标记为NAME标记:这就是问题所在.

您必须认识到词法分析器独立于解析器运行.如果解析器试图匹配KEYPRESS令牌,则词法分析器不会监听"该令牌,而只是按照以下规则构造一个令牌:

  1. 匹配消耗最多字符的规则
  2. 如果还有更多匹配相同字符数的规则,请选择最先定义的规则

考虑这些规则以及您的规则顺序:

NAME : [A-Za-z_][A-Za-z_0-9]* ;

INT : [0-9]+ ;

KEY : [a-z] | [0-9] | 'shift' | 'ctrl' | 'alt' | 'meta' | 'space' | 'left' | 'right' | 'up' | 'down' | 'minus' | 'equals' | 'backspace' | 'openbracket' | 'closebracket' | 'backslash' | 'semicolon' | 'quote' | 'enter' | 'comma' | 'period' | 'slash' ;

KEYPRESS : 'keyup' | 'keydown' ;

将在大多数KEY替代项之前创建NAME令牌,并且将创建所有KEYPRESS替代项.

并且由于INT匹配一个或多个数字,并且在KEY之前定义,而KEY也具有一位数字替代,因此很明显,词法分析器将永远不会产生KEYKEYPRESS令牌.

如果您将NAMEINT规则移到KEYKEYPRESS规则之下,那么我猜是大多数标记都将按您的预期构造.

编辑

可能的解决方案如下:

KEY : [a-z] | 'shift' | 'ctrl' | 'alt' | 'meta' | 'space' | 'left' | 'right' | 'up' | 'down' | 'minus' | 'equals' | 'backspace' | 'openbracket' | 'closebracket' | 'backslash' | 'semicolon' | 'quote' | 'enter' | 'comma' | 'period' | 'slash' ;

KEYPRESS : 'keyup' | 'keydown' ;

NAME : [A-Za-z_][A-Za-z_0-9]* ;

SINGLE_DIGIT : [0-9] ;

INT : [0-9]+ ;

即我从KEY中删除了[0-9]替代项,并引入了SINGLE_DIGIT规则(放置在之前 之前!).

现在创建一些额外的解析器规则:

integer : INT | SINGLE_DIGIT ;

key : KEY | SINGLE_DIGIT ;

并将解析器规则内所有出现的INT更改为integer(不要称您的规则int:这是保留字),并将所有KEY更改为key.

并且您可能还想做类似于NAMEKEY中的[a-z]替代项的操作(即,一个小写字符现在永远不会被标记为NAME,而总是被标记为KEY)

I have a grammar file BoardFile.g4 that has (relevant parts only):

grammar Board;

//Tokens
GADGET : 'squareBumper' | 'circleBumper' | 'triangleBumper' | 'leftFlipper' | 'rightFlipper' | 'absorber' | 'portal' ;
NAME : [A-Za-z_][A-Za-z_0-9]* ;
INT : [0-9]+ ;
FLOAT : '-'?[0-9]+('.'[0-9]+)? ;
COMMENT : '#' ~( '\r' | '\n' )*;
WHITESPACE : [ \t\r\n]+ -> skip ;
KEY : [a-z] | [0-9] | 'shift' | 'ctrl' | 'alt' | 'meta' | 'space' | 'left' | 'right' | 'up' | 'down' | 'minus' | 'equals' | 'backspace' | 'openbracket' | 'closebracket' | 'backslash' | 'semicolon' | 'quote' | 'enter' | 'comma' | 'period' | 'slash' ;
KEYPRESS : 'keyup' | 'keydown' ;

//Rules
file : define+ EOF ;
define : board | ball | gadget | fire | COMMENT | key ;
board : 'board' 'name' '=' name ('gravity' '=' gravity)? ('friction1' '=' friction1)? ('friction2' '=' friction2)? ;
ball : 'ball' 'name' '=' name 'x' '=' xfloat 'y' '=' yfloat 'xVelocity' '=' xvel 'yVelocity' '=' yvel ;
gadget : gadgettype 'name' '=' name 'x' '=' xint 'y' '=' yint ('width' '=' width 'height' '=' height)? ('orientation' '=' orientation)? ('otherBoard' '=' name 'otherPortal' '=' name)? ;
fire : 'fire' 'trigger' '=' trigger 'action' '=' action ;
key : keytype 'key' '=' KEY 'action' '=' name ;

name : NAME ;
gadgettype : GADGET ;
keytype : KEYPRESS ;
gravity : FLOAT ;
friction1 : FLOAT ;
friction2 : FLOAT ;
trigger : NAME ;
action : NAME ;
yfloat : FLOAT ;
xfloat : FLOAT ;
yint : INT ;
xint : INT ;
xvel : FLOAT ;
yvel : FLOAT ;
orientation : INT ;
width : INT ;
height : INT ;

This generates the lexer and parser fine. However, when I use it against the following file, it gives the following error:

line 12:0 extraneous input 'keyup' expecting {<EOF>, KEYPRESS}

File to Parse:

board name=keysBoard gravity=5.0 friction1=0.0 friction2=0.0

# define a ball
ball name=Ball x=0.5 y=0.5 xVelocity=2.5 yVelocity=2.5

# add some flippers
leftFlipper name=FlipL1 x=16 y=2 orientation=0
leftFlipper name=FlipL2 x=16 y=9 orientation=0

# add keys. lots of keys.
keyup key=space action=apple
keydown key=a action=ball
keyup key=backslash action=cat
keydown key=period action=dog

I went through other questions about this error in SO but none helped me. I cannot figure out what's going wrong. Why am I getting this error?

解决方案

The string "keyup" is being tokenized as a NAME token: that is the problem.

You must realize that the lexer operates independently from the parser. If the parser is trying to match a KEYPRESS token, the lexer does not "listen" to it, but just constructs a token following the rules:

  1. match the rule that consumes the most characters
  2. if there are more rules that match the same amount of characters, choose the one that is defined first

Taking these rules into account, and the order of your rules:

NAME : [A-Za-z_][A-Za-z_0-9]* ;

INT : [0-9]+ ;

KEY : [a-z] | [0-9] | 'shift' | 'ctrl' | 'alt' | 'meta' | 'space' | 'left' | 'right' | 'up' | 'down' | 'minus' | 'equals' | 'backspace' | 'openbracket' | 'closebracket' | 'backslash' | 'semicolon' | 'quote' | 'enter' | 'comma' | 'period' | 'slash' ;

KEYPRESS : 'keyup' | 'keydown' ;

a NAME token will be created before most of the KEY alternatives, and all of the KEYPRESS alternatives will be created.

And since an INT matches one or more digits and is defined before KEY which also has a single digit alternative, it is clear that the lexer will never produce a KEY or KEYPRESS token.

If you move the NAME and INT rule below the KEY and KEYPRESS rules, then most of the tokens will be constructed as you expect, is my guess.

EDIT

A possible solution would look like:

KEY : [a-z] | 'shift' | 'ctrl' | 'alt' | 'meta' | 'space' | 'left' | 'right' | 'up' | 'down' | 'minus' | 'equals' | 'backspace' | 'openbracket' | 'closebracket' | 'backslash' | 'semicolon' | 'quote' | 'enter' | 'comma' | 'period' | 'slash' ;

KEYPRESS : 'keyup' | 'keydown' ;

NAME : [A-Za-z_][A-Za-z_0-9]* ;

SINGLE_DIGIT : [0-9] ;

INT : [0-9]+ ;

I.e. I removed the [0-9] alternative from KEY and introduced a SINGLE_DIGIT rule (which is placed before the INT rule!).

Now create some extra parser rules:

integer : INT | SINGLE_DIGIT ;

key : KEY | SINGLE_DIGIT ;

and change all occurrences of INT inside parser rules to integer (don't call your rule int: it is a reserved word) and change all KEY to key.

And you might also want to do something similar to NAME and the [a-z] alternative in KEY (i.e. a single lowercase char would now never be tokenized as a NAME, always as a KEY).

这篇关于Antlr无关输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆