为什么ANTLR4不匹配“of"?作为一个词和“,"作为标点符号? [英] Why does not ANTLR4 match "of" as a word and "," as punctuation?

查看:20
本文介绍了为什么ANTLR4不匹配“of"?作为一个词和“,"作为标点符号?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有语法定义的 Hello.g4 语法文件:

I have a Hello.g4 grammar file with a grammar definition:

definition : wordsWithPunctuation ;
words : (WORD)+ ;
wordsWithPunctuation : word ( word | punctuation word | word punctuation | '(' wordsWithPunctuation ')' | '"' wordsWithPunctuation '"' )*  ;
NUMBER : [0-9]+ ;
word : WORD ;
WORD : [A-Za-z-]+ ;
punctuation : PUNCTUATION ;
PUNCTUATION : (','|'!'|'?'|'\''|':'|'.') ;
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines

现在,如果我尝试从以下输入构建解析树:

Now, if I am trying to build a parse tree from the following input:

a b c d of at of abc bcd of
a b c d at abc, bcd
a b c d of at of abc, bcd of

它返回错误:

Hello::definition:1:31: extraneous input 'of' expecting {<EOF>, '(', '"', WORD, PUNCTUATION}

虽然:

a b c d  at:  abc bcd!

工作正常.

语法或输入或解释器有什么问题?

What is wrong with the grammar or input or interpreter?

如果我修改 wordsWithPunctuation 规则,通过添加 (... | 'of' | ',' word | ...) 然后它完全匹配输入,但对我来说它看起来很可疑 - of 这个词与 aabc 有什么不同?或者为什么 , 与其他 punctuation 字符不同(即为什么它匹配 :!,但不是 ,?)?

If I modify the wordsWithPunctuation rule, by adding (... | 'of' | ',' word | ...) then it matches the input completely, but it looks suspicious for me - how the word of is different from the word a or abc? Or why the , is different from other punctuation characters (i.e., why does it match the : or !, but not ,?)?

我正在为 Eclipse 使用 ANTLR4 插件,因此项目构建发生以下输出:

I am working with ANTLR4 plugin for Eclipse, so the project build happens with the following output:

ANTLR Tool v4.2.2 (/var/folders/.../antlr-4.2.2-complete.jar)
Hello.g4 -o /Users/.../eclipse_workspace/antlr_test_project/target/generated-sources/antlr4 -listener -no-visitor -encoding UTF-8

更新 2:

上述语法只是一部分来自:

Update2:

the presented above grammar is just a partial from:

grammar Hello;

text : (entry)+ ;

entry : blub 'abrr' '-' ('1')? '.' ('(' NUMBER ')')? sims '-' '(' definitionAndExamples ')' 'Hello' 'all' 'the' 'people' 'of' 'the' 'world';

blub : WORD ;

sims : sim (',' sim)* ;
sim : words ;

definitionAndExamples : definitions (';' examples)? ;

definitions : definition (';' definition )* ;
definition : wordsWithPunctuation ;

examples : example (';' example )* ;
example : '"' wordsWithPunctuation '"' ;

words : (WORD)+ ;
wordsWithPunctuation : word ( word | punctuation word | word punctuation | '(' wordsWithPunctuation ')' | '"' wordsWithPunctuation '"' )*  ;

NUMBER : [0-9]+ ;
word : WORD ;
WORD : [A-Za-z-]+ ;
punctuation : PUNCTUATION ;
PUNCTUATION : (','|'!'|'?'|'\''|':'|'.') ;
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines

现在对我来说,entry 规则中的单词以某种方式打破了 entry 规则中的其他规则.但为什么?它是语法中的一种反模式吗?

It looks now for me, that the words from the entry rule somehow breaking the other rules within the entry rule. But why? Is it a kind an anti-pattern in the grammar?

推荐答案

通过在解析器规则中包含 'of',ANTLR 正在创建一个隐式匿名标记来表示该输入.单词 of 将始终具有该特殊标记类型,因此它永远不会具有 WORD 类型.它可能出现在解析树中的唯一位置是 'of' 在解析器规则中出现的位置.

By including 'of' in a parser rule, ANTLR is creating an implicit anonymous token to represent that input. The word of will always have that special token type, so it will never have the type WORD. The only place it may appear in your parse tree is at a location where 'of' appears in a parser rule.

HelloLexer.g4解析器语法HelloParser中,你可以通过将你的语法分离成一个单独的词法分析器语法HelloLexer来防止ANTLR创建这些匿名标记类型HelloParser.g4 中的代码>.我强烈建议您始终使用此表单,原因如下:

You can prevent ANTLR from creating these anonymous token types by separating your grammar into a separate lexer grammar HelloLexer in HelloLexer.g4 and parser grammar HelloParser in HelloParser.g4. I highly recommend you always use this form for the following reasons:

  1. 词法分析器模式只有在您这样做时才有效.
  2. 隐式定义的标记是语法中最常见的错误来源之一,分离语法可以防止它发生.

分离语法后,您可以更新 word 解析器规则,以允许将特殊标记 of 视为一个词.

Once you have the grammar separated, you can update your word parser rule to allow the special token of to be treated as a word.

word
  : WORD
  | 'of'
  | ... other keywords which are also "words"
  ;

这篇关于为什么ANTLR4不匹配“of"?作为一个词和“,"作为标点符号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆