了解ANTLR4令牌 [英] Understanding ANTLR4 Tokens

查看:172
本文介绍了了解ANTLR4令牌的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对ANTLR还是很陌生,我想了解ATNLR4中确切的Token是什么.考虑以下非常荒谬的语法:

I'm pretty new to ANTLR and I'm trying to understand what exactly Token is in ATNLR4. Consider the following pretty nonsensical grammar:

grammar Tst;

init: A token=('+'|'-') B;

A: .+?;
B: .+?;
ADD: '+';
SUB: '-';

ANTLR4为其生成以下TstParser.InitContext:

ANTLR4 generates the following TstParser.InitContext for it:

public static class InitContext extends ParserRuleContext {
        public Token token;       //<---------------------------- HERE
        public TerminalNode A() { return getToken(TstParser.A, 0); }
        public TerminalNode B() { return getToken(TstParser.B, 0); }
        public InitContext(ParserRuleContext parent, int invokingState) {
            super(parent, invokingState);
        }
        @Override public int getRuleIndex() { return RULE_init; }
        @Override
        public void enterRule(ParseTreeListener listener) {
            if ( listener instanceof TstListener ) ((TstListener)listener).enterInit(this);
        }
        @Override
        public void exitRule(ParseTreeListener listener) {
            if ( listener instanceof TstListener ) ((TstListener)listener).exitInit(this);
        }
    }

现在,所有词法分析器规则都可以在解析器类中用作静态常量:

Now, all lexer rules are available as static constants in the parser class:

public static final int A=1, B=2, ADD=3, SUB=4;

我们如何让他们识别词法分析器规则?所有ABADD规则都可以匹配'+'.因此,在测试时应该使用哪种类型.

How can we us them to identify lexer rules? All A, B, and ADD rules may match '+'. So what type should I use when testing it.

我的意思是

TstParser.InitContext ctx;
//...
ctx.token.getType() == //What type?
                       //TstParse.A
                       //TstParse.B
                       //or
                       //TstParse.ADD?

通常,我想学习ANTLR4如何知道Token的类型?

In general, I would like to learn how ANTLR4 knows the type of a Token?

推荐答案

我将尝试向您介绍解析过程.该过程分为两个阶段.词法分析器部分(在其中创建令牌)和解析器部分. (这是解析表达式的来源-如果我们一般来说谈论解析,则不是很精确).您在此过程中要做的所有事情就是理解输入,并可能同时创建输入模型.为了减轻这种情况,工作通常分为较小的步骤.理解主要表示为单词"的记号(比字符大得多的输入元素)更容易理解. (准确地说是关键字,变量,文字).

I will try to introduce you to the process of parsing. There are two stages of the process. Lexer part (where tokens are created) and parser part. (This is where parsing expression comes from - not very precise if we are talking about parsing in general). All you are trying to do in the process is to understand the input and meanwhile maybe create a model of the input. To ease this, job is generally divided into smaller steps. It is much easier to understand tokens (somewhat bigger elements of input than characters) represented mainly as "words". (Keywords, variables, literals to be precise).

因此,您要做的第一步是将字符流形式的输入预处理为TOKENS.关于令牌,您所能说的就是与它关联的值和令牌的种类.例如,在非常简单的计算器输入中,"2 + 3 * 9"中的"2"代表值2的数字令牌,"+"代表值"+"的操作员代币,依此类推...词法分析器部分的结果是令牌流.您可以想象,词法分析器和解析器规则非常相似.第一步适用于字符,第二步适用于标记.

Because of this the first step you do is to pre-process the input in the form of character stream into TOKENS. All you can say about the token is what value is connected with it and what kind of token it is. For instance in very simple calculator input "2+3*9" '2' represents number token of value 2, '+' represents operator toke of value '+' and so on... The result of lexer part is stream of tokens. As you can imagine, lexer and parser rules are very similar. First step works with characters, second step works with tokens.

关于ANTLR(许多其他生成器的工作原理相同),关于词法分析器有一个重要规则.对于不同的令牌,您不能具有相同的规则.因此,您插入的语法不会起作用,因为A和B之间的词法分析器部分不会有所不同.您可以在双方之间使用相同的标记名称.您稍后会处理.

Regarding ANTLR (many other generators works the same), there is one important rule regarding lexer. You cannot have the same rule for different tokens. So the grammar you have inserted wont work as the lexer part cannot differ between A and B. You can just use the same token name for both sides. You will take care of it later.

为什么词法分析器规则不能相同?当词法分析器处理输入时,它将遍历流.它尝试找到的第一个词法分析器规则,如果可以,它将应用它.因此,如果还有另一条同样适用的规则,嗯,真是太可惜了.它不会有机会.解析器在ANTLR中比词法分析器要宽裕得多.

Why cannot lexer rules be the same? As the lexer process the input, it walks the stream. It tries the first lexer rule it finds and if it is ok it will apply it. So if there is another rule that would apply as well, hm, what a pitty. It would not get a chance. Parser is much more generous in ANTLR than lexer.

总结一下.令牌是词法分析器的产品,它们是一个或多个字符的组,应作为一个整体提供给下一步.我们正在讨论变量名称,运算符,函数名称等.

To sum it up. Tokens are products of lexer, they are groups of one or more characters that should be presented to next step as a single unit. We are taling about variable names, operators, function names etc.

这篇关于了解ANTLR4令牌的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆