ANTLR4-动态注入令牌 [英] ANTLR4- dynamically inject token

查看:28
本文介绍了ANTLR4-动态注入令牌的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我正在编写一个 python 解析器,我需要根据 python 语法规范.

So I'm writing a python parser and I need to dynamically generate INDENT and DEDENT tokens (because python doesn't use explicit delimiters) according to the python grammar specification.

基本上我有一堆表示缩进级别的整数.在 INDENT 标记中的嵌入式 Java 操作中,我检查当前的缩进级别是否高于堆栈顶部的级别;如果是,我就推它;如果没有,我调用 skip().

Basically I have a stack of integers representing indentation levels. In an embedded Java action in the INDENT token, I check if the current level of indentation is higher than the level on top of the stack; if it is, I push it on; if not, I call skip().

问题是,如果当前的缩进级别与堆栈中的多个级别相匹配,我必须生成多个 DEDENT 标记,而我不知道如何做到这一点.

The problem is, if the current indentation level matches a level multiple levels down in the stack, I have to generate multiple DEDENT tokens, and I can't figure out how to do that.

我当前的代码:(注意within_indent_blockcurrent_indent_level是在别处管理的)

My current code: (note that within_indent_block and current_indent_level are managed elsewhere)

fragment DENT: {within_indent_block}? (SPACE|TAB)+;

INDENT: {within_indent_block}? DENT
        {if(current_indent_level > whitespace_stack.peek().intValue()){
                 whitespace_stack.push(new Integer(current_indent_level));
                 within_indent_block = false;
         }else{
                 skip();
         }
         }
         ;    

DEDENT: {within_indent_block}? DENT
        {if(current_indent_level < whitespace_stack.peek().intValue()){
            while(current_indent_level < whitespace_stack.peek().intValue()){
                      whitespace_stack.pop();
                      <<injectDedentToken()>>; //how do I do this
            }
         }else{
               skip();
         }
         }
         ;

我该怎么做和/或有更好的方法吗?

How do I do this and / or is there a better way?

推荐答案

您发布的代码存在一些问题.

There are a few problems with the code you have posted.

  1. INDENTDEDENT 规则在语义上是相同的(考虑谓词和规则引用,但忽略动作).由于 INDENT 首先出现,这意味着您永远不能让 DEDENT 规则生成的标记就是这种语法.
  2. {within_indent_block}? 谓词出现在您引用 DENT 之前以及 DENT 片段规则本身内部.这种重复没有任何意义,但会减慢您的词法分析器的速度.
  1. The INDENT and DEDENT rules are semantically identical (considering predicates and rule references, but ignoring actions). Since INDENT appears first, this means you can never have a token produced by the DEDENT rule is this grammar.
  2. The {within_indent_block}? predicate appears before you reference DENT as well as inside the DENT fragment rule itself. This duplication serves no purpose but will slow down your lexer.

匹配后操作的实际处理最好放在 Lexer.nextToken().例如,您可以从以下内容开始.

The actual handling of post-matching actions is best placed in an override of Lexer.nextToken(). For example, you could start with something like the following.

private final Deque<Token> pendingTokens = new ArrayDeque<>();

@Override
public Token nextToken() {
    while (pendingTokens.isEmpty()) {
        Token token = super.nextToken();
        switch (token.getType()) {
        case INDENT:
            // handle indent here. to skip this token, simply don't add
            // anything to the pendingTokens queue and super.nextToken()
            // will be called again.
            break;

        case DEDENT:
            // handle indent here. to skip this token, simply don't add
            // anything to the pendingTokens queue and super.nextToken()
            // will be called again.
            break;

        default:
            pendingTokens.add(token);
            break;
        }
    }

    return pendingTokens.poll();
}

这篇关于ANTLR4-动态注入令牌的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆