Antlrworks-无关输入 [英] Antlrworks - extraneous input

查看:102
本文介绍了Antlrworks-无关输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在这方面是新手,因此,我需要您的帮助. 我正在尝试解析Wikipedia转储,而第一步是将它们定义的每个规则映射到ANTLR,不幸的是,我遇到了第一个障碍:

I am new in this stuff, and for that reason I will need your help.. I am trying to parse the Wikipedia Dump, and my first step is to map each rule defined by them into ANTLR, unfortunally I got my first barrier:

第1:8行多余的输入'''''期望'\'\''

line 1:8 extraneous input ''''' expecting '\'\''

我不了解发生了什么,请帮忙.

I am not understanding what is going on, please lend me your help.

我的代码:

grammar Test;

options {
    language = Java;
}

parse
    :  term+ EOF
    ;

term 
    :  IDENT
    |  '[[' term ']]'
    |  '\'\'' term '\'\''
    |  '\'\'\'' term '\'\'\''
    ;    

IDENT
    :  ('a'..'z' | 'A'..'Z' | '0'..'9' | '=' | '#' | '"' | ' ')*
    ;

输入 '''''Hello World'''''

Input '''''Hello World'''''

推荐答案

词法分析器规则必须始终至少匹配1个字符.您的规则:

A lexer rule must always match at least 1 character. Your rule:

IDENT : ('a'..'z' | 'A'..'Z' | '0'..'9' | '=' | '#' | '"' | ' ')*;

匹配一个空字符串(其中无穷多个).将*更改为+:

matches an empty string (of which there are an infinite amount of). Change the * to a +:

IDENT : ('a'..'z' | 'A'..'Z' | '0'..'9' | '=' | '#' | '"' | ' ')+;

编辑

输入'''''Hello World'''''

尽管您将文字标记放在解析器规则('\'\'\'''\'\''等)中,但您必须了解,它们不是在解析器的要求下创建的.词法分析器遵循严格的规则来创建令牌:

Although you put literal tokens inside parser rules ('\'\'\'', '\'\'', etc.), you must understand that they are not created at the behest of the parser. The lexer follows strict rules to create tokens:

  1. 它尝试尽可能匹配
  2. 如果2个不同的词法分析器规则匹配相同数量的字符,则首先定义的一个将获得优先级

让我们给您的文字标记一个名称:

Let's give your literal tokens a name:

BRACKET_OPEN  : '[[';
BRACKET_CLOSE : ']]';
Q3            : '\'\'\'';
Q2            : '\'\'';
IDENT         :  ('a'..'z' | 'A'..'Z' | '0'..'9' | '=' | '#' | '"' | ' ')+;

现在,由于规则#1(尽可能匹配),输入'''''Hello World'''''将被标记如下:

Now, because of rule #1 (match as much as possible), the input '''''Hello World''''' will be tokenized as follows:

  • Q3
  • Q2
  • IDENT
  • Q3(是的,Q3!)
  • Q2
  • Q3
  • Q2
  • IDENT
  • Q3 (yes, a Q3!)
  • Q2

但是您的解析器规则term将仅接受Q3 Q2 IDENT Q2 Q3,因此正确地解析您的输入是正确的.

But your parser rule term will only accept Q3 Q2 IDENT Q2 Q3, so it is correct that your input fails to parse properly.

此外,我建议您不要使用解释器:这很容易出错.调试器的工作原理却很吸引人!

Also, I recommend you not use the interpreter: it's rather buggy. The debugger works like a charm though!

这篇关于Antlrworks-无关输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆