ANTLR解析器用于字母数字单词,中间可能有空格 [英] ANTLR parser for alpha numeric words which may have whitespace in between

查看:136
本文介绍了ANTLR解析器用于字母数字单词,中间可能有空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我试图识别一个正常的单词,但下面的方法效果很好:

First I tried to identify a normal word and below works fine:

grammar Test;

myToken: WORD;
WORD: (LOWERCASE | UPPERCASE )+ ;
fragment LOWERCASE  : [a-z] ;
fragment UPPERCASE  : [A-Z] ;
fragment DIGIT: '0'..'9' ;
WHITESPACE  : (' ' | '\t')+;

当我在"myToken"下方的解析器规则下方添加内容时,即使我的WORD令牌也无法被输入字符串识别为"abc"

Just when I added below parser rule just beneath "myToken", even my WORD tokens weren't getting recognised with input string as "abc"

ALPHA_NUMERIC_WS: ( WORD | DIGIT | WHITESPACE)+;

有人知道为什么吗?

推荐答案

这是因为ANTLR的词法分析器匹配先到先得".这意味着它将托盘匹配给定的输入与第一个指定的规则(在源代码中),如果该规则可以匹配输入,它将不会尝试将其与其他规则匹配.

This is because ANTLR's lexer matches "first come, first serve". That means it will tray to match the given input with the first specified (in the source code) rule and if that one can match the input, it won't try to match it with the other ones.

在您的情况下,ALPHA_NUMERIC_WS确实匹配了与WORD相同的内容(以及更多内容),并且由于它是在WORD之前指定的,因此WORD将永远不会用于匹配输入,因为没有输入可以由WORD匹配,而第一个已处理的ALPHA_NUMERIC_WS无法匹配. (WSDIGIT也是如此)规则.

In your case ALPHA_NUMERIC_WS does match the same content as WORD (and more) and because it is specified before WORD, WORD will never be used to match the input as there is no input that can be matched by WORD that can't be matched by the first processed ALPHA_NUMERIC_WS. (The same applies for the WS and the DIGIT) rule.

我猜想您想要的不是创建ALPHA_NUMERIC_WS-令牌(通过将其指定为lexer规则来完成),而是使其成为解析器规则,以便可以从另一个parsre规则中将其引用为允许WORD s,DIGIT s和WS s的任意序列.

I guess that what you want is not to create a ALPHA_NUMERIC_WS-token (as is done by specifying it as a lexer rule) but to make it a parser rule instead so it then can be referenced from another parsre rule to allow an arbitrary sequence of WORDs, DIGITs and WSs.

因此,您需要这样写:

alpha_numweric_ws: ( WORD | DIGIT | WHITESPACE)+;

如果您实际上要创建相应的标记,则可以删除以下规则,或者需要考虑词法分析器的工作是什么,以及在词法分析器和解析器之间的区分位置(您需要重新设计语法,以便这样就可以了.

If you actually want to create the respective token you can either remove the following rules or you need to think about what a lexer's job is and where to draw the line between lexer and parser (You need to redesign your grammar in order for this to work).

这篇关于ANTLR解析器用于字母数字单词,中间可能有空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆