如何在 ANTLR3 树解析器 @init 操作中获取行号 [英] How to get line number in ANTLR3 tree-parser @init action

查看:31
本文介绍了如何在 ANTLR3 树解析器 @init 操作中获取行号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在ANTLR,版本3中,如何在高级树解析器规则的@init动作中获取行号?

In ANTLR, version 3, how can the line number be obtained in the @init action of a high-level tree-parser rule?

例如,在下面的@init 操作中,我想将行号与句子文本一起推送.

For example, in the @init action below, I'd like to push the line number along with the sentence text.

sentence
    @init { myNodeVisitor.pushScriptContext( new MyScriptContext( $sentence.text )); }
    : assignCommand 
    | actionCommand;
    finally {
        m_nodeVisitor.popScriptContext();
    }

我需要在执行与规则中符号相关的操作之前推送上下文.

I need to push the context before the execution of the actions associated with symbols in the rules.

一些不起作用的事情:

  • 使用 $sentence.line -- 它没有定义,即使 $sentence.text 是.
  • 将释义推送移动到规则操作中.放置在规则之前,规则中没有可用的令牌.放置在规则之后,动作发生在与规则符号关联的动作之后.
  • 在@init 操作中使用此表达式,该操作会编译但返回值 0:getTreeNodeStream().getTreeAdaptor().getToken( $sentence.start ).getLine(). 实际上,这确实有效,如果 $sentence.start 是真实标记或带有参考的虚构符号 - 请参阅下面的 Bart Kiers 回答.
  • Using $sentence.line -- it's not defined, even though $sentence.text is.
  • Moving the paraphrase push into the rule actions. Placed before the rule, no token in the rule is available. Placed after the rule, the action happens after actions associated with the rule symbols.
  • Using this expression in the @init action, which compiles but returns the value 0: getTreeNodeStream().getTreeAdaptor().getToken( $sentence.start ).getLine(). Actually, this does work, if $sentence.start is either a real token or an imaginary with a reference -- see Bart Kiers answer below.

似乎如果我可以在@init 规则中轻松获取匹配的文本和第一个匹配的标记,那么也应该有一种简单的方法来获取行号.

It seems like if I can easily get, in the @init rule, the matched text and the first matched token, there should be an easy way to get the line number as well.

推荐答案

您可以使用以下命令在树语法的标记/树流中向前看 1 步:CommonTree ahead = (CommonTree)input.LT(1),你可以把它放在 @init 部分.

You can look 1 step ahead in the token/tree-stream of a tree grammar using the following: CommonTree ahead = (CommonTree)input.LT(1), which you can place in the @init section.

每个CommonTree(ANTLR 中默认的Tree 实现)都有一个getToken() 方法,它返回Token 与这棵树相关联.并且每个 Token 都有一个 getLine() 方法,它返回这个标记的行号.

Every CommonTree (the default Tree implementation in ANTLR) has a getToken() method which return the Token associated with this tree. And each Token has a getLine() method, which, not surprisingly, returns the line number of this token.

因此,如果您执行以下操作:

So, if you do the following:

sentence
@init {
  CommonTree ahead = (CommonTree)input.LT(1);
  int line = ahead.getToken().getLine();
  System.out.println("line=" + line);
}
  :  assignCommand 
  |  actionCommand
  ;

您应该能够看到一些正在打印的正确行号.我说一些,因为在所有情况下这不会按计划进行.让我用一个简单的示例语法来演示:

you should be able to see some correct line numbers being printed. I say some, because this won't go as planned in all cases. Let me demonstrate using a simple example grammar:

grammar ASTDemo;

options { 
  output=AST;
}

tokens {
  ROOT;
  ACTION;
}

parse
  :  sentence+ EOF -> ^(ROOT sentence+)
  ;

sentence
  :  assignCommand 
  |  actionCommand
  ;

assignCommand
  :  ID ASSIGN NUMBER -> ^(ASSIGN ID NUMBER)
  ;

actionCommand
  :  action ID -> ^(ACTION action ID)
  ;

action
  :  START
  |  STOP
  ;

ASSIGN : '=';
START  : 'start';
STOP   : 'stop';
ID     : ('a'..'z' | 'A'..'Z')+;
NUMBER : '0'..'9'+;
SPACE  : (' ' | '\t' | '\r' | '\n')+ {skip();};

其树语法如下:

tree grammar ASTDemoWalker;

options {
  output=AST;
  tokenVocab=ASTDemo;
  ASTLabelType=CommonTree;
}


walk
  :  ^(ROOT sentence+)
  ;

sentence
@init {
  CommonTree ahead = (CommonTree)input.LT(1);
  int line = ahead.getToken().getLine();
  System.out.println("line=" + line);
}
  :  assignCommand 
  |  actionCommand
  ;

assignCommand
  :  ^(ASSIGN ID NUMBER)
  ;

actionCommand
  :  ^(ACTION action ID)
  ;

action
  :  START
  |  STOP
  ;

如果您运行以下测试类:

And if you run the following test class:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String src = "\n\n\nABC = 123\n\nstart ABC";
    ASTDemoLexer lexer = new ASTDemoLexer(new ANTLRStringStream(src));
    ASTDemoParser parser = new ASTDemoParser(new CommonTokenStream(lexer));
    CommonTree root = (CommonTree)parser.parse().getTree();
    ASTDemoWalker walker = new ASTDemoWalker(new CommonTreeNodeStream(root));
    walker.walk();
  }
}

您将看到以下内容被打印:

you will see the following being printed:

line=4
line=0

如您所见,"ABC = 123" 产生了预期的输出(第 4 行),但 "start ABC" 没有(第 0 行).这是因为 action 规则的根是一个 ACTION 标记,并且这个标记从未在词法分析器中定义,只在 tokens{...}块.并且因为它实际上并不存在于输入中,所以默认情况下将第 0 行附加到它.如果你想改变行号,你需要提供一个引用"标记作为这个所谓的 imaginary ACTION 标记的参数,它用来将属性复制到自身中.

As you can see, "ABC = 123" produced the expected output (line 4), but "start ABC" didn't (line 0). This is because the root of the action rule is a ACTION token and this token is never defined in the lexer, only in the tokens{...} block. And because it doesn't really exist in the input, by default the line 0 is attached to it. If you want to change the line number, you need to provide a "reference" token as a parameter to this so called imaginary ACTION token which it uses to copy attributes into itself.

所以,如果将组合语法中的actionCommand规则改为:

So, if you change the actionCommand rule in the combined grammar into:

actionCommand
  :  ref=action ID -> ^(ACTION[$ref.start] action ID)
  ;

行号将如预期的那样(第 6 行).

the line number would be as expected (line 6).

请注意,每个解析器规则都有一个 startend 属性,它们分别是对第一个和最后一个标记的引用.如果 action 是一个词法分析器规则(比如 FOO),那么你可以从中省略 .start :

Note that every parser rule has a start and end attribute which is a reference to the first and last token, respectively. If action was a lexer rule (say FOO), then you could have omitted the .start from it:

actionCommand
  :  ref=FOO ID -> ^(ACTION[$ref] action ID)
  ;

现在 ACTION 标记已经复制了 $ref 指向的所有属性,除了标记的类型,当然是 int ACTION.但这也意味着它复制了 text 属性,因此在我的示例中,由 ref=action ID -> 创建的 AST^(ACTION[$ref.start] 动作 ID) 可能如下所示:

Now the ACTION token has copied all attributes from whatever $ref is pointing to, except the type of the token, which is of course int ACTION. But this also means that it copied the text attribute, so in my example, the AST created by ref=action ID -> ^(ACTION[$ref.start] action ID) could look like:

            [text=START,type=ACTION]
                  /         \
                 /           \
                /             \
   [text=START,type=START]  [text=ABC,type=ID]

当然,这是一个合适的 AST,因为节点的类型是唯一的,但是由于 ACTIONSTART 共享相同的 .text,这使得调试变得混乱 属性.

Of course, it's a proper AST because the types of the nodes are unique, but it makes debugging confusing since ACTION and START share the same .text attribute.

您可以通过提供第二个字符串参数将所有属性复制到 imaginary 标记,除了 .text.type,如下所示:

You can copy all attributes to an imaginary token except the .text and .type by providing a second string parameter, like this:

actionCommand
  :  ref=action ID -> ^(ACTION[$ref.start, "Action"] action ID)
  ;

如果您现在再次运行相同的测试类,您将看到以下打印内容:

And if you now run the same test class again, you will see the following printed:

line=4
line=6

如果你检查生成的树,它看起来像这样:

And if you inspect the tree that is generated, it'll look like this:

[type=ROOT, text='ROOT']
  [type=ASSIGN, text='=']
    [type=ID, text='ABC']
    [type=NUMBER, text='123']
  [type=ACTION, text='Action']
    [type=START, text='start']
    [type=ID, text='ABC']

这篇关于如何在 ANTLR3 树解析器 @init 操作中获取行号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆