如何在ANTLR3树解析器@init操作中获取行号 [英] How to get line number in ANTLR3 tree-parser @init action

查看:67
本文介绍了如何在ANTLR3树解析器@init操作中获取行号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在ANTLR版本3中,如何在高级树解析器规则的@init操作中获取行号?

In ANTLR, version 3, how can the line number be obtained in the @init action of a high-level tree-parser rule?

例如,在下面的@init操作中,我想将行号和句子文本一起推送.

For example, in the @init action below, I'd like to push the line number along with the sentence text.

sentence
    @init { myNodeVisitor.pushScriptContext( new MyScriptContext( $sentence.text )); }
    : assignCommand 
    | actionCommand;
    finally {
        m_nodeVisitor.popScriptContext();
    }

在执行与规则中的符号相关的动作之前,我需要在之前推送上下文.

I need to push the context before the execution of the actions associated with symbols in the rules.

一些不起作用的事情:

  • 使用$sentence.line-尽管$sentence.text是定义的,但未定义.
  • 将释义移动到规则操作中.放在规则之前,规则中没有令牌可用.该动作放置在规则之后,在与规则符号关联的动作之后发生.
  • 在@init操作中使用此表达式,该表达式编译但返回值0:getTreeNodeStream().getTreeAdaptor().getToken( $sentence.start ).getLine(). 实际上,如果$ sentence.start是真实的令牌或带有参考的虚数,这确实可以工作-请参见下面的Bart Kiers答案.
  • Using $sentence.line -- it's not defined, even though $sentence.text is.
  • Moving the paraphrase push into the rule actions. Placed before the rule, no token in the rule is available. Placed after the rule, the action happens after actions associated with the rule symbols.
  • Using this expression in the @init action, which compiles but returns the value 0: getTreeNodeStream().getTreeAdaptor().getToken( $sentence.start ).getLine(). Actually, this does work, if $sentence.start is either a real token or an imaginary with a reference -- see Bart Kiers answer below.

如果我可以轻松地在@init规则中获得匹配的文本和第一个匹配的标记,似乎也应该有一种简单的方法来获取行号.

It seems like if I can easily get, in the @init rule, the matched text and the first matched token, there should be an easy way to get the line number as well.

推荐答案

您可以使用以下代码在树语法的标记/树流中向前看1步:CommonTree ahead = (CommonTree)input.LT(1),您可以将其放置在部分.

You can look 1 step ahead in the token/tree-stream of a tree grammar using the following: CommonTree ahead = (CommonTree)input.LT(1), which you can place in the @init section.

每个CommonTree(ANTLR中的默认Tree实现)都有一个getToken()方法,该方法返回与此树关联的Token.每个Token都有一个getLine()方法,毫不奇怪,该方法返回此令牌的行号.

Every CommonTree (the default Tree implementation in ANTLR) has a getToken() method which return the Token associated with this tree. And each Token has a getLine() method, which, not surprisingly, returns the line number of this token.

因此,如果您执行以下操作:

So, if you do the following:

sentence
@init {
  CommonTree ahead = (CommonTree)input.LT(1);
  int line = ahead.getToken().getLine();
  System.out.println("line=" + line);
}
  :  assignCommand 
  |  actionCommand
  ;

您应该能够看到正在打印的一些正确行号.我说 some ,因为这在 all 情况下不会按计划进行.让我用一个简单的示例语法来演示:

you should be able to see some correct line numbers being printed. I say some, because this won't go as planned in all cases. Let me demonstrate using a simple example grammar:

grammar ASTDemo;

options { 
  output=AST;
}

tokens {
  ROOT;
  ACTION;
}

parse
  :  sentence+ EOF -> ^(ROOT sentence+)
  ;

sentence
  :  assignCommand 
  |  actionCommand
  ;

assignCommand
  :  ID ASSIGN NUMBER -> ^(ASSIGN ID NUMBER)
  ;

actionCommand
  :  action ID -> ^(ACTION action ID)
  ;

action
  :  START
  |  STOP
  ;

ASSIGN : '=';
START  : 'start';
STOP   : 'stop';
ID     : ('a'..'z' | 'A'..'Z')+;
NUMBER : '0'..'9'+;
SPACE  : (' ' | '\t' | '\r' | '\n')+ {skip();};

树语法如下:

tree grammar ASTDemoWalker;

options {
  output=AST;
  tokenVocab=ASTDemo;
  ASTLabelType=CommonTree;
}


walk
  :  ^(ROOT sentence+)
  ;

sentence
@init {
  CommonTree ahead = (CommonTree)input.LT(1);
  int line = ahead.getToken().getLine();
  System.out.println("line=" + line);
}
  :  assignCommand 
  |  actionCommand
  ;

assignCommand
  :  ^(ASSIGN ID NUMBER)
  ;

actionCommand
  :  ^(ACTION action ID)
  ;

action
  :  START
  |  STOP
  ;

如果您运行以下测试类:

And if you run the following test class:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String src = "\n\n\nABC = 123\n\nstart ABC";
    ASTDemoLexer lexer = new ASTDemoLexer(new ANTLRStringStream(src));
    ASTDemoParser parser = new ASTDemoParser(new CommonTokenStream(lexer));
    CommonTree root = (CommonTree)parser.parse().getTree();
    ASTDemoWalker walker = new ASTDemoWalker(new CommonTreeNodeStream(root));
    walker.walk();
  }
}

您将看到以下打印内容:

you will see the following being printed:

line=4
line=0

如您所见,"ABC = 123"产生了预期的输出(第4行),但"start ABC"没有产生(预期的0行).这是因为action规则的根是ACTION标记,并且该标记从未在词法分析器中定义,仅在tokens{...}块中定义.并且由于输入中实际上并不存在该行,因此默认情况下将第0行附加到该行.如果要更改行号,则需要提供一个引用"令牌作为此所谓的 imaginary ACTION令牌的参数,它用于将属性复制到自身中.

As you can see, "ABC = 123" produced the expected output (line 4), but "start ABC" didn't (line 0). This is because the root of the action rule is a ACTION token and this token is never defined in the lexer, only in the tokens{...} block. And because it doesn't really exist in the input, by default the line 0 is attached to it. If you want to change the line number, you need to provide a "reference" token as a parameter to this so called imaginary ACTION token which it uses to copy attributes into itself.

因此,如果将组合语法中的actionCommand规则更改为:

So, if you change the actionCommand rule in the combined grammar into:

actionCommand
  :  ref=action ID -> ^(ACTION[$ref.start] action ID)
  ;

行号将与预期的一样(第6行).

the line number would be as expected (line 6).

请注意,每个解析器规则都有一个startend属性,分别引用第一个和最后一个标记.如果action是词法分析器规则(例如FOO),那么您可以从中省略.start:

Note that every parser rule has a start and end attribute which is a reference to the first and last token, respectively. If action was a lexer rule (say FOO), then you could have omitted the .start from it:

actionCommand
  :  ref=FOO ID -> ^(ACTION[$ref] action ID)
  ;

现在,ACTION令牌已经复制了$ref所指向内容的所有属性,但令牌的类型当然是int ACTION.但这也意味着它复制了text属性,因此在我的示例中,由ref=action ID -> ^(ACTION[$ref.start] action ID)创建的AST可能类似于:

Now the ACTION token has copied all attributes from whatever $ref is pointing to, except the type of the token, which is of course int ACTION. But this also means that it copied the text attribute, so in my example, the AST created by ref=action ID -> ^(ACTION[$ref.start] action ID) could look like:

            [text=START,type=ACTION]
                  /         \
                 /           \
                /             \
   [text=START,type=START]  [text=ABC,type=ID]

当然,这是一个适当的AST,因为节点的类型是唯一的,但是由于ACTIONSTART共享相同的.text属性,因此调试令人困惑.

Of course, it's a proper AST because the types of the nodes are unique, but it makes debugging confusing since ACTION and START share the same .text attribute.

通过提供第二个字符串参数,您可以将除.text.type之外的所有属性复制到虚构令牌,如下所示:

You can copy all attributes to an imaginary token except the .text and .type by providing a second string parameter, like this:

actionCommand
  :  ref=action ID -> ^(ACTION[$ref.start, "Action"] action ID)
  ;

如果现在再次运行相同的测试类,则会看到以下内容:

And if you now run the same test class again, you will see the following printed:

line=4
line=6

如果您检查生成的树,它将看起来像这样:

And if you inspect the tree that is generated, it'll look like this:

[type=ROOT, text='ROOT']
  [type=ASSIGN, text='=']
    [type=ID, text='ABC']
    [type=NUMBER, text='123']
  [type=ACTION, text='Action']
    [type=START, text='start']
    [type=ID, text='ABC']

这篇关于如何在ANTLR3树解析器@init操作中获取行号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆