如何输出使用ANTLR构建的AST? [英] How to output the AST built using ANTLR?

查看:27
本文介绍了如何输出使用ANTLR构建的AST?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为 C 制作一个静态分析器.我已经使用 ANTLR 完成了词法分析器和解析器,其中生成了 Java 代码.

I'm making a static analyzer for C. I have done the lexer and parser using ANTLR in which generates Java code.

ANTLR 是否通过 options {output=AST;} 自动为我们构建 AST?还是我必须自己做树?如果是,那么如何吐出该 AST 上的节点?

Does ANTLR build the AST for us automatically by options {output=AST;}? Or do I have to make the tree myself? If it does, then how to spit out the nodes on that AST?

我目前认为该 AST 上的节点将用于制作 SSA,然后进行数据流分析以制作静态分析器.我在正确的道路上吗?

I am currently thinking that the nodes on that AST will be used for making SSA, followed by data flow analysis in order to make the static analyzer. Am I on the right path?

推荐答案

拉斐尔写道:

antlr 是否通过 option{output=AST;} 自动为我们构建 AST?还是我必须自己做树?如果是,那么如何吐出那个AST上的节点?

Does antlr build the AST for us automatically by option{output=AST;}? Or do I have to make the tree myself? If it does, then how to spit out the nodes on that AST?

不,解析器不知道你想要什么作为每个解析器规则的根和叶,所以你要做的不仅仅是把 options { output=AST;} 在你的语法中.

No, the parser does not know what you want as root and as leaves for each parser rule, so you'll have to do a bit more than just put options { output=AST; } in your grammar.

例如解析源码时"true &&(false || true && (true || false))" 使用从语法生成的解析器:

For example, when parsing the source "true && (false || true && (true || false))" using the parser generated from the grammar:

grammar ASTDemo;

options { 
  output=AST; 
}

parse
  :  orExp
  ;

orExp
  :  andExp ('||' andExp)*
  ;

andExp
  :  atom ('&&' atom)*
  ;

atom
  :  'true'
  |  'false'
  |  '(' orExp ')'
  ;

// ignore white space characters
Space
  :  (' ' | '	' | '
' | '
') {$channel=HIDDEN;}
  ;

生成如下解析树:

(即只是一个扁平的一维令牌列表)

(i.e. just a flat, 1 dimensional list of tokens)

您需要告诉 ANTLR 语法中的哪些标记成为根、叶或简单地从树中删除.

You'll want to tell ANTLR which tokens in your grammar become root, leaves, or simply left out of the tree.

可以通过两种方式创建 AST:

Creating AST's can be done in two ways:

  1. 使用如下所示的重写规则:foo : A B C D ->^(D A B);,其中foo 是匹配标记A B C D 的解析器规则.所以 -> 之后的所有内容都是实际的重写规则.如您所见,重写规则中没有使用标记 C,这意味着它从 AST 中被省略了.^( 之后直接放置的标记将成为树的根;
  2. 使用树运算符 ^! after 解析器规则中的标记,其中 ^ 将使一个令牌成为根,而 ! 将从树中删除一个令牌.foo 的等价物:A B C D ->^(D A B); 将是 foo : A B C!D^;
  1. use rewrite rules which look like this: foo : A B C D -> ^(D A B);, where foo is a parser rule that matches the tokens A B C D. So everything after the -> is the actual rewrite rule. As you can see, the token C is not used in the rewrite rule, which means it is omitted from the AST. The token placed directly after the ^( will become the root of the tree;
  2. use the tree-operators ^ and ! after a token inside your parser rules where ^ will make a token the root, and ! will delete a token from the tree. The equivalent for foo : A B C D -> ^(D A B); would be foo : A B C! D^;

两个 foo : A B C D ->^(D A B);foo : A B C!D^; 将产生以下 AST:

Both foo : A B C D -> ^(D A B); and foo : A B C! D^; will produce the following AST:

现在,您可以按如下方式重写语法:

Now, you could rewrite the grammar as follows:

grammar ASTDemo;

options { 
  output=AST; 
}

parse
  :  orExp
  ;

orExp
  :  andExp ('||'^ andExp)* // Make `||` root
  ;

andExp
  :  atom ('&&'^ atom)* // Make `&&` root
  ;

atom
  :  'true'
  |  'false'
  |  '(' orExp ')' -> orExp // Just a single token, no need to do `^(...)`, 
                            // we're removing the parenthesis. Note that
                            // `'('! orExp ')'!` will do exactly the same.
  ;

// ignore white space characters
Space
  :  (' ' | '	' | '
' | '
') {$channel=HIDDEN;}
  ;

这将从源代码创建以下 AST "true &&(false || true && (true || false))》:

which will create the following AST from the source "true && (false || true && (true || false))":

相关的 ANTLR 维基链接:

Related ANTLR wiki links:

拉斐尔写道:

我目前认为该 AST 上的节点将用于制作 SSA,然后进行数据流分析以制作静态分析器.我走在正确的道路上吗?

I am currently thinking that the nodes on that AST will be used for making SSA, followed by data flow analysis in order to make the static analyzer. Am I on the right path?

从来没有做过这样的事情,但 IMO 你想要的第一件事是来自源头的 AST,所以是的,我猜你在正确的道路上!:)

Never did anything like that, but IMO the first thing you'd want is an AST from the source, so yeah, I guess your on the right path! :)

以下是使用生成的词法分析器和解析器的方法:

Here's how you can use the generated lexer and parser:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String src = "true && (false || true && (true || false))";
    ASTDemoLexer lexer = new ASTDemoLexer(new ANTLRStringStream(src));
    ASTDemoParser parser = new ASTDemoParser(new CommonTokenStream(lexer));
    CommonTree tree = (CommonTree)parser.parse().getTree();
    DOTTreeGenerator gen = new DOTTreeGenerator();
    StringTemplate st = gen.toDOT(tree);
    System.out.println(st);
  }
}

这篇关于如何输出使用ANTLR构建的AST?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆