ANTLR4-动态注入令牌 [英] ANTLR4- dynamically inject token
问题描述
所以我正在编写一个 python 解析器,我需要根据
So I'm writing a python parser and I need to dynamically generate INDENT
and DEDENT
tokens (because python doesn't use explicit delimiters) according to the python grammar specification.
基本上我有一堆表示缩进级别的整数.在 INDENT
标记中的嵌入式 Java 操作中,我检查当前的缩进级别是否高于堆栈顶部的级别;如果是,我就推它;如果没有,我调用 skip()
.
Basically I have a stack of integers representing indentation levels. In an embedded Java action in the INDENT
token, I check if the current level of indentation is higher than the level on top of the stack; if it is, I push it on; if not, I call skip()
.
问题是,如果当前的缩进级别与堆栈中的多个级别相匹配,我必须生成多个 DEDENT
标记,而我不知道如何做到这一点.
The problem is, if the current indentation level matches a level multiple levels down in the stack, I have to generate multiple DEDENT
tokens, and I can't figure out how to do that.
我当前的代码:(注意within_indent_block
和current_indent_level
是在别处管理的)
My current code: (note that within_indent_block
and current_indent_level
are managed elsewhere)
fragment DENT: {within_indent_block}? (SPACE|TAB)+;
INDENT: {within_indent_block}? DENT
{if(current_indent_level > whitespace_stack.peek().intValue()){
whitespace_stack.push(new Integer(current_indent_level));
within_indent_block = false;
}else{
skip();
}
}
;
DEDENT: {within_indent_block}? DENT
{if(current_indent_level < whitespace_stack.peek().intValue()){
while(current_indent_level < whitespace_stack.peek().intValue()){
whitespace_stack.pop();
<<injectDedentToken()>>; //how do I do this
}
}else{
skip();
}
}
;
我该怎么做和/或有更好的方法吗?
How do I do this and / or is there a better way?
推荐答案
您发布的代码存在一些问题.
There are a few problems with the code you have posted.
INDENT
和DEDENT
规则在语义上是相同的(考虑谓词和规则引用,但忽略动作).由于INDENT
首先出现,这意味着您永远不能让DEDENT
规则生成的标记就是这种语法.{within_indent_block}?
谓词出现在您引用DENT
之前以及DENT
片段规则本身内部.这种重复没有任何意义,但会减慢您的词法分析器的速度.
- The
INDENT
andDEDENT
rules are semantically identical (considering predicates and rule references, but ignoring actions). SinceINDENT
appears first, this means you can never have a token produced by theDEDENT
rule is this grammar. - The
{within_indent_block}?
predicate appears before you referenceDENT
as well as inside theDENT
fragment rule itself. This duplication serves no purpose but will slow down your lexer.
匹配后操作的实际处理最好放在 Lexer.nextToken()
.例如,您可以从以下内容开始.
The actual handling of post-matching actions is best placed in an override of Lexer.nextToken()
. For example, you could start with something like the following.
private final Deque<Token> pendingTokens = new ArrayDeque<>();
@Override
public Token nextToken() {
while (pendingTokens.isEmpty()) {
Token token = super.nextToken();
switch (token.getType()) {
case INDENT:
// handle indent here. to skip this token, simply don't add
// anything to the pendingTokens queue and super.nextToken()
// will be called again.
break;
case DEDENT:
// handle indent here. to skip this token, simply don't add
// anything to the pendingTokens queue and super.nextToken()
// will be called again.
break;
default:
pendingTokens.add(token);
break;
}
}
return pendingTokens.poll();
}
这篇关于ANTLR4-动态注入令牌的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!