ANTLR4 不匹配的输入 '' 期待 [英] ANTLR4 mismatched input '' expecting

查看:27
本文介绍了ANTLR4 不匹配的输入 '' 期待的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前,我刚刚在 ANTLR4 中定义了简单的规则:

Currently, I've just defined simple rules in ANTLR4:

// Recognizer Rules

program  : (class_dcl)+ EOF;
class_dcl: 'class' ID ('extends' ID)? '{' class_body '}';
class_body: (const_dcl|var_dcl|method_dcl)*;
const_dcl: ('static')? 'final' PRIMITIVE_TYPE ID '=' expr ';';
var_dcl: ('static')? id_list ':' type ';';
method_dcl: PRIMITIVE_TYPE ('static')? ID '(' para_list ')' block_stm;
para_list: (para_dcl (';' para_dcl)*)?;
para_dcl: id_list ':' PRIMITIVE_TYPE;
block_stm: '{' '}';
expr: <assoc=right> expr '=' expr | expr1;
expr1: term ('<' | '>' | '<=' | '>=' | '==' | '!=') term | term;
term: ('+'|'-') term | term ('*'|'/') term | term ('+'|'-') term | fact;
fact: INTLIT | FLOATLIT | BOOLLIT | ID | '(' expr ')';
type: PRIMITIVE_TYPE ('[' INTLIT ']')?;
id_list: ID (',' ID)*;

// Lexer Rules

KEYWORD: PRIMITIVE_TYPE | BOOLLIT | 'class' | 'extends' | 'if' | 'then' | 'else'
    | 'null' | 'break' | 'continue' | 'while' | 'return' | 'self' | 'final'
    | 'static' | 'new' | 'do';
SEPARATOR: '[' | ']' | '{' | '}' | '(' | ')' | ';' | ':' | '.' | ',';
OPERATOR: '^' | 'new' | '=' | UNA_OPERATOR | BIN_OPERATOR;
UNA_OPERATOR: '!';
BIN_OPERATOR: '+' | '-' | '*' | '\\' | '/' | '%' | '>' | '>=' | '<' | '<='
    | '==' | '<>' | '&&' | '||' | ':=';
PRIMITIVE_TYPE: 'integer' | 'float' | 'bool' | 'string' | 'void';
BOOLLIT: 'true' | 'false';
FLOATLIT: [0-9]+ ((('.'[0-9]* (('E'|'e')('+'|'-')?[0-9]+)? ))|(('E'|'e')('+'|'-')? [0-9]+));
INTLIT: [0-9]+;
STRINGLIT: '"' ('\\'[bfrnt\\"]|~[\r\t\n\\"])* '"';
ILLEGAL_ESC: '"' (('\\'[bfrnt\\"]|~[\n\\"]))* ('\\'(~[bfrnt\\"]))
    {if (true) throw new bkool.parser.IllegalEscape(getText());};
UNCLOSED_STRING: '"'('\\'[bfrnt\\"]|~[\r\t\n\\"])*
    {if (true) throw new bkool.parser.UncloseString(getText());};
COMMENT: (BLOCK_COMMENT|LINE_COMMENT) -> skip;
BLOCK_COMMENT: '(''*'(('*')?(~')'))*'*'')';
LINE_COMMENT: '#' (~[\n])* ('\n'|EOF);
ID: [a-zA-z_]+ [a-zA-z_0-9]* ;

WS: [ \t\r\n]+ -> skip ;
ERROR_TOKEN: . {if (true) throw new bkool.parser.ErrorToken(getText());};

我打开解析树,并尝试测试:

I opened the parse tree, and tried to test:

class abc
{
 final integer x=1;
}

返回错误:

BKOOL::program:3:8: mismatched input 'integer' expecting PRIMITIVE_TYPE
BKOOL::program:3:17: mismatched input '=' expecting {':', ','}

我还是不明白为什么.你能帮我解释一下为什么它没有像我预期的那样识别规则和令牌吗?

I still haven't got why. Could you please help me why it didn't recognize rules and tokens as I expected?

推荐答案

词法分析器规则是排他性的.时间最长的获胜,决胜局是语法顺序.

Lexer rules are exclusive. The longest wins, and the tiebreaker is the grammar order.

就你而言;integerKEYWORD 而不是 PRIMITIVE_TYPE.

In your case; integer is a KEYWORD instead of PRIMITIVE_TYPE.

你应该在这里做什么:

  • 为每个关键字制作一个不同的标记,而不是一个包罗万象的KEYWORD规则.
  • PRIMITIVE_TYPE 变成解析器规则
  • 操作员也一样
  • Make one distinct token per keyword instead of an all-catching KEYWORD rule.
  • Turn PRIMITIVE_TYPE into a parser rule
  • Same for operators

现在,您的示例:

class abc
{
 final integer x=1;
}

被转换为词素,例如:
class ID { final KEYWORD ID = INTLIT ; }

Gets converted to lexemes such as:
class ID { final KEYWORD ID = INTLIT ; }

这要归功于隐式标记类型,因为您在解析器规则中使用了诸如 'class' 之类的定义.这些被转换为匿名令牌,例如 T_001 : 'class'; 获得最高优先级.

This is thanks to the implicit token typing, as you've used definitions such as 'class' in your parser rules. These get converted to anonymous tokens such as T_001 : 'class'; which get the highest priority.

如果不是这样,你最终会得到:
KEYWORD ID SEPARATOR KEYWORD KEYWORD ID OPERATOR INTLIT ; SEPARATOR

If this weren't the case, you'd end up with:
KEYWORD ID SEPARATOR KEYWORD KEYWORD ID OPERATOR INTLIT ; SEPARATOR

那是......不太容易解析;-)
这就是为什么我要告诉你正确分解你的代币.

And that's... not quite easy to parse ;-)
That's why I'm telling you to breakdown your tokens properly.

这篇关于ANTLR4 不匹配的输入 '' 期待的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆