如何输出使用ANTLR构建的AST? [英] How to output the AST built using ANTLR?
问题描述
我正在为 C 制作一个静态分析器.我已经使用 ANTLR 完成了词法分析器和解析器,其中生成了 Java 代码.
I'm making a static analyzer for C. I have done the lexer and parser using ANTLR in which generates Java code.
ANTLR 是否通过 options {output=AST;}
自动为我们构建 AST?还是我必须自己做树?如果是,那么如何吐出该 AST 上的节点?
Does ANTLR build the AST for us automatically by options {output=AST;}
? Or do I have to make the tree myself? If it does, then how to spit out the nodes on that AST?
我目前认为该 AST 上的节点将用于制作 SSA,然后进行数据流分析以制作静态分析器.我在正确的道路上吗?
I am currently thinking that the nodes on that AST will be used for making SSA, followed by data flow analysis in order to make the static analyzer. Am I on the right path?
推荐答案
拉斐尔写道:
antlr 是否通过 option{output=AST;} 自动为我们构建 AST?还是我必须自己做树?如果是,那么如何吐出那个AST上的节点?
Does antlr build the AST for us automatically by option{output=AST;}? Or do I have to make the tree myself? If it does, then how to spit out the nodes on that AST?
不,解析器不知道你想要什么作为每个解析器规则的根和叶,所以你要做的不仅仅是把 options { output=AST;}
在你的语法中.
No, the parser does not know what you want as root and as leaves for each parser rule, so you'll have to do a bit more than just put options { output=AST; }
in your grammar.
例如解析源码时"true &&(false || true && (true || false))"
使用从语法生成的解析器:
For example, when parsing the source "true && (false || true && (true || false))"
using the parser generated from the grammar:
grammar ASTDemo;
options {
output=AST;
}
parse
: orExp
;
orExp
: andExp ('||' andExp)*
;
andExp
: atom ('&&' atom)*
;
atom
: 'true'
| 'false'
| '(' orExp ')'
;
// ignore white space characters
Space
: (' ' | ' ' | '
' | '
') {$channel=HIDDEN;}
;
生成如下解析树:
(即只是一个扁平的一维令牌列表)
(i.e. just a flat, 1 dimensional list of tokens)
您需要告诉 ANTLR 语法中的哪些标记成为根、叶或简单地从树中删除.
You'll want to tell ANTLR which tokens in your grammar become root, leaves, or simply left out of the tree.
可以通过两种方式创建 AST:
Creating AST's can be done in two ways:
- 使用如下所示的重写规则:
foo : A B C D ->^(D A B);
,其中foo
是匹配标记A B C D
的解析器规则.所以->
之后的所有内容都是实际的重写规则.如您所见,重写规则中没有使用标记C
,这意味着它从 AST 中被省略了.^(
之后直接放置的标记将成为树的根; - 使用树运算符
^
和!
after 解析器规则中的标记,其中^
将使一个令牌成为根,而!
将从树中删除一个令牌.foo 的等价物:A B C D ->^(D A B);
将是foo : A B C!D^;
- use rewrite rules which look like this:
foo : A B C D -> ^(D A B);
, wherefoo
is a parser rule that matches the tokensA B C D
. So everything after the->
is the actual rewrite rule. As you can see, the tokenC
is not used in the rewrite rule, which means it is omitted from the AST. The token placed directly after the^(
will become the root of the tree; - use the tree-operators
^
and!
after a token inside your parser rules where^
will make a token the root, and!
will delete a token from the tree. The equivalent forfoo : A B C D -> ^(D A B);
would befoo : A B C! D^;
两个 foo : A B C D ->^(D A B);
和 foo : A B C!D^;
将产生以下 AST:
Both foo : A B C D -> ^(D A B);
and foo : A B C! D^;
will produce the following AST:
现在,您可以按如下方式重写语法:
Now, you could rewrite the grammar as follows:
grammar ASTDemo;
options {
output=AST;
}
parse
: orExp
;
orExp
: andExp ('||'^ andExp)* // Make `||` root
;
andExp
: atom ('&&'^ atom)* // Make `&&` root
;
atom
: 'true'
| 'false'
| '(' orExp ')' -> orExp // Just a single token, no need to do `^(...)`,
// we're removing the parenthesis. Note that
// `'('! orExp ')'!` will do exactly the same.
;
// ignore white space characters
Space
: (' ' | ' ' | '
' | '
') {$channel=HIDDEN;}
;
这将从源代码创建以下 AST "true &&(false || true && (true || false))》
:
which will create the following AST from the source "true && (false || true && (true || false))"
:
相关的 ANTLR 维基链接:
Related ANTLR wiki links:
拉斐尔写道:
我目前认为该 AST 上的节点将用于制作 SSA,然后进行数据流分析以制作静态分析器.我走在正确的道路上吗?
I am currently thinking that the nodes on that AST will be used for making SSA, followed by data flow analysis in order to make the static analyzer. Am I on the right path?
从来没有做过这样的事情,但 IMO 你想要的第一件事是来自源头的 AST,所以是的,我猜你在正确的道路上!:)
Never did anything like that, but IMO the first thing you'd want is an AST from the source, so yeah, I guess your on the right path! :)
以下是使用生成的词法分析器和解析器的方法:
Here's how you can use the generated lexer and parser:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
String src = "true && (false || true && (true || false))";
ASTDemoLexer lexer = new ASTDemoLexer(new ANTLRStringStream(src));
ASTDemoParser parser = new ASTDemoParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.parse().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}
这篇关于如何输出使用ANTLR构建的AST?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!