用于字母数字单词的 ANTLR 解析器，它们之间可能有空格 [英] ANTLR parser for alpha numeric words which may have whitespace in between

查看：15 发布时间：2021/11/11 4:12:18 antlr antlr4

本文介绍了用于字母数字单词的 ANTLR 解析器，它们之间可能有空格的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

首先我尝试识别一个正常的单词，下面的工作正常:

First I tried to identify a normal word and below works fine:

grammar Test;

myToken: WORD;
WORD: (LOWERCASE | UPPERCASE )+ ;
fragment LOWERCASE  : [a-z] ;
fragment UPPERCASE  : [A-Z] ;
fragment DIGIT: '0'..'9' ;
WHITESPACE  : (' ' | '\t')+;

就在我在myToken"下面添加解析器规则时，即使我的 WORD 标记也没有被输入字符串识别为abc"

Just when I added below parser rule just beneath "myToken", even my WORD tokens weren't getting recognised with input string as "abc"

ALPHA_NUMERIC_WS: ( WORD | DIGIT | WHITESPACE)+;

有人知道这是为什么吗?

Does anyone have any idea why is that?

推荐答案

这是因为 ANTLR 的词法分析器匹配先到先得".这意味着它将托盘将给定的输入与第一个指定的(在源代码中)规则匹配，如果该规则可以匹配输入，则不会尝试将其与其他规则匹配.

This is because ANTLR's lexer matches "first come, first serve". That means it will tray to match the given input with the first specified (in the source code) rule and if that one can match the input, it won't try to match it with the other ones.

在您的情况下，ALPHA_NUMERIC_WS 确实与 WORD(以及更多)匹配的内容相同，并且因为它在 WORD 之前指定，WORD 永远不会用于匹配输入，因为没有可以被 WORD 匹配的输入，而不能被第一个处理的 ALPHA_NUMERIC_WS 匹配.(同样适用于 WS 和 DIGIT)规则.

In your case ALPHA_NUMERIC_WS does match the same content as WORD (and more) and because it is specified before WORD, WORD will never be used to match the input as there is no input that can be matched by WORD that can't be matched by the first processed ALPHA_NUMERIC_WS. (The same applies for the WS and the DIGIT) rule.

我想您想要的不是创建 ALPHA_NUMERIC_WS-token(就像通过将其指定为词法分析器规则来完成的那样)而是使其成为解析器规则，以便可以对其进行引用从另一个解析规则允许 WORDs、DIGITs 和 WSs 的任意序列.

I guess that what you want is not to create a ALPHA_NUMERIC_WS-token (as is done by specifying it as a lexer rule) but to make it a parser rule instead so it then can be referenced from another parsre rule to allow an arbitrary sequence of WORDs, DIGITs and WSs.

因此你想这样写:

alpha_numweric_ws: ( WORD | DIGIT | WHITESPACE)+;

如果您确实想要创建相应的标记，您可以删除以下规则，或者您需要考虑词法分析器的工作是什么以及在词法分析器和解析器之间划清界线的位置(您需要重新设计语法以便这可以工作).

If you actually want to create the respective token you can either remove the following rules or you need to think about what a lexer's job is and where to draw the line between lexer and parser (You need to redesign your grammar in order for this to work).

这篇关于用于字母数字单词的 ANTLR 解析器，它们之间可能有空格的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用于字母数字单词的 ANTLR 解析器，它们之间可能有空格 [英] ANTLR parser for alpha numeric words which may have whitespace in between

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

用于字母数字单词的 ANTLR 解析器，它们之间可能有空格 [英] ANTLR parser for alpha numeric words which may have whitespace in between

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭