为什么ANTLR4与"of"不匹配?作为单词和“,"标点符号? [英] Why does not ANTLR4 match "of" as a word and "," as punctuation?

查看:101
本文介绍了为什么ANTLR4与"of"不匹配?作为单词和“,"标点符号?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Hello.g4带有语法定义的语法文件:

I have a Hello.g4 grammar file with a grammar definition:

definition : wordsWithPunctuation ;
words : (WORD)+ ;
wordsWithPunctuation : word ( word | punctuation word | word punctuation | '(' wordsWithPunctuation ')' | '"' wordsWithPunctuation '"' )*  ;
NUMBER : [0-9]+ ;
word : WORD ;
WORD : [A-Za-z-]+ ;
punctuation : PUNCTUATION ;
PUNCTUATION : (','|'!'|'?'|'\''|':'|'.') ;
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines

现在,如果我尝试通过以下输入来构建解析树:

Now, if I am trying to build a parse tree from the following input:

a b c d of at of abc bcd of
a b c d at abc, bcd
a b c d of at of abc, bcd of

它返回错误:

Hello::definition:1:31: extraneous input 'of' expecting {<EOF>, '(', '"', WORD, PUNCTUATION}

尽管:

a b c d  at:  abc bcd!

工作正常.

语法,输入或解释器有什么问题?

What is wrong with the grammar or input or interpreter?

如果我通过添加(... | 'of' | ',' word | ...)来修改wordsWithPunctuation规则,则它与输入完全匹配,但是对我来说似乎很可疑-单词of与单词aabc有何不同?还是为什么,与其他punctuation字符不同(即,为什么它与:!匹配,而不与,匹配?)?

If I modify the wordsWithPunctuation rule, by adding (... | 'of' | ',' word | ...) then it matches the input completely, but it looks suspicious for me - how the word of is different from the word a or abc? Or why the , is different from other punctuation characters (i.e., why does it match the : or !, but not ,?)?

我正在使用用于Eclipse的ANTLR4插件,因此项目构建发生在以下输出中:

I am working with ANTLR4 plugin for Eclipse, so the project build happens with the following output:

ANTLR Tool v4.2.2 (/var/folders/.../antlr-4.2.2-complete.jar)
Hello.g4 -o /Users/.../eclipse_workspace/antlr_test_project/target/generated-sources/antlr4 -listener -no-visitor -encoding UTF-8

Update2:

上述语法只是以下内容的一部分:

Update2:

the presented above grammar is just a partial from:

grammar Hello;

text : (entry)+ ;

entry : blub 'abrr' '-' ('1')? '.' ('(' NUMBER ')')? sims '-' '(' definitionAndExamples ')' 'Hello' 'all' 'the' 'people' 'of' 'the' 'world';

blub : WORD ;

sims : sim (',' sim)* ;
sim : words ;

definitionAndExamples : definitions (';' examples)? ;

definitions : definition (';' definition )* ;
definition : wordsWithPunctuation ;

examples : example (';' example )* ;
example : '"' wordsWithPunctuation '"' ;

words : (WORD)+ ;
wordsWithPunctuation : word ( word | punctuation word | word punctuation | '(' wordsWithPunctuation ')' | '"' wordsWithPunctuation '"' )*  ;

NUMBER : [0-9]+ ;
word : WORD ;
WORD : [A-Za-z-]+ ;
punctuation : PUNCTUATION ;
PUNCTUATION : (','|'!'|'?'|'\''|':'|'.') ;
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines

现在看来,entry规则中的单词以某种方式破坏了entry规则中的其他规则.但为什么?这是一种语法上的反模式吗?

It looks now for me, that the words from the entry rule somehow breaking the other rules within the entry rule. But why? Is it a kind an anti-pattern in the grammar?

推荐答案

通过在解析器规则中包含'of',ANTLR正在创建一个隐式匿名令牌来表示该输入.单词of将始终具有该特殊令牌类型,因此它永远不会具有类型WORD.它可能出现在解析树中的唯一位置是解析器规则中出现'of'的位置.

By including 'of' in a parser rule, ANTLR is creating an implicit anonymous token to represent that input. The word of will always have that special token type, so it will never have the type WORD. The only place it may appear in your parse tree is at a location where 'of' appears in a parser rule.

通过将语法分为 HelloLexer.g4 中的单独的lexer grammar HelloLexer HelloParser.g4 中的parser grammar HelloParser,可以防止ANTLR创建这些匿名标记类型.出于以下原因,我强烈建议您始终使用此表单:

You can prevent ANTLR from creating these anonymous token types by separating your grammar into a separate lexer grammar HelloLexer in HelloLexer.g4 and parser grammar HelloParser in HelloParser.g4. I highly recommend you always use this form for the following reasons:

  1. 只有这样做,Lexer模式才能工作.
  2. 隐式定义的标记是语法中最常见的错误来源之一,而将语法分开可防止其发生.

一旦语法分开,就可以更新word解析器规则,以将特殊标记of视为单词.

Once you have the grammar separated, you can update your word parser rule to allow the special token of to be treated as a word.

word
  : WORD
  | 'of'
  | ... other keywords which are also "words"
  ;

这篇关于为什么ANTLR4与"of"不匹配?作为单词和“,"标点符号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆