ANTLR:如何使用树语法替换子树中的特定节点? [英] ANTLR: How to replace specific nodes in a subtree using a Tree grammar?
问题描述
我有一个创建 AST 的 ANTLR 语法,然后我编写了两个树语法来创建树解析器,以便在 AST 上进行两次传递以进行语义分析.(之后我再做一遍并用StringTemplate生成输出代码)
I have an ANTLR grammar which creates an AST, and then I have written two tree grammars that create tree parsers for doing two passes on the AST for semantic analysis purposes. (After that I do another pass and generate output code with StringTemplate)
到目前为止一切正常,但我正在尝试扩展语言以支持函数中的参数多态性(而到目前为止它只支持简单"函数).
Everything works fine so far, but I am trying to extend the language to support parametric polymorphism in functions (while so far it only supported "simple" functions).
例如我想要这样的东西:
So for example I want to have something like this:
T getMax<T> (T a, T b) {
if (a > b) return a;
return b;
}
并根据调用函数的实际类型生成简单的、非参数化的多态代码.例如,如果有人调用 getMax
那么我只会生成 int getMax(int a, int b)
and generate simple, non parametrically polymorphic code, according to the actual types with which the function is called. For example if someone calls getMax<int> (5)
then I will only generate the code for int getMax(int a, int b)
到目前为止,在第一遍中,我检查了对多态函数的所有调用,并保存了调用该函数的特定类型.
So far, in the 1st pass, I check all the calls to the polymorphic function, and save the specific types with which the function was called with.
所以,此时我知道所有参数类型,以及它们需要替换的所有实际类型.
So, at this point I know all the parametric types, and all the actual types that they need to be replaced with.
在第二遍中,我想实际修改我的语法树并用 1 个或多个具有特定类型的函数声明替换这个参数化多态函数声明.
In the 2nd pass, I want to actually modify my syntax tree and replace this parametrically polymorphic function declaration, with 1 or more function declarations that have specific types.
所以,问题是
在 AST 中复制和创建兄弟"节点(以及它们的所有子节点)并将它们插入到原始函数声明旁边的最佳方法是什么?我只是说一些像
What is the best way to copy and create "sibling" nodes (with also all their children) in the AST and insert them next to the original function declaration? Do I just say something like
{
myTreeNode.parent.addChild(myTreeNode.dupNode());
}
在上面的示例中,将所有出现的 T
类型替换为 int
的最佳方法是什么?我认为简单的重写规则是不够的,因为我还需要替换函数体中的所有类型.我是否需要编写另一种树语法来仅在此函数声明子树上工作并进行所有替换?手动完成更容易吗?
What is the best way to replace all occurrences of type T
with, say, int
in the newlycreated subtree in the above example? I think a simple rewrite rule is not enough, as I also need to replace all the types in the body of the function.
Do I need to write another tree grammar that will work just on this function declaration subtree and do all the replacing? Is it easier to do it manually?
对不起,如果这太混乱了.
Sorry if this is too confusing.
编辑:
假设您的输入是这样的:
Let's say your input is this:
T add<T> (T a, T b) { return a+b }
add<int>(1, 2)
add<string>('f', 'oo')
你的 AST 在第二遍之后会是什么样子?
how would your AST look like after the 2nd pass?
我正在考虑删除原来的函数声明,并在其位置引入 2 个专门的声明.
I was thinking of deleting the original function declaration, and introducing 2 specialised declarations at its place.
因此生成的 AST 将是一个根节点(我们可以称之为程序"),它有 4 个子节点、2 个函数声明和 2 个函数调用.
So the resulting AST would be a root node (we can call it "program") with 4 children, 2 function delcarations and 2 function calls.
印刷 Lisp 风格:
Printed Lisp-style:
(METHOD_DECL int add (ARG_DECL int a) (ARG_DECL int b) (BLOCK (return (EXPR (+ (EXPR a) (EXPR b))))))
(METHOD_DECL string add (ARG_DECL string a) (ARG_DECL string b) (BLOCK (return (EXPR (+ (EXPR a) (EXPR b))))))
(EXPR (CALL add (ELIST (EXPR 3) (EXPR 4))))
(EXPR (CALL add (ELIST (EXPR "f") (EXPR "oo"))))
被删除和替换的子树是这样的:
The subtree that got deleted and replaced was this:
(METHOD_DECL T add (TYPEPARAMS T) (ARG_DECL T a) (ARG_DECL T b) (BLOCK (return (EXPR (+ (EXPR a) (EXPR b))))))
另外,我想删除原来的参数化多态函数声明子树的原因,是我在做类型检查的时候不知道怎么忽略它.树解析器自动"匹配所有二元运算、返回语句、参数等,并进行类型检查.所以,如果我把它留在那里,它会报告错误,因为例如 T 不是一个正确的类型,所以你不能将它与 + 运算符一起使用.
Also, the reason I want to delete the original parametrically polymorphic function declaration subtree, is that I don't know how to ignore it when I do the type checking. The tree parser "automatically" matches all the binary operations, return statements, arguments etc and does the type checking. So, if I leave it there it reports errors because for example T is not a proper type, so you can't use it with the + operator.
推荐答案
就个人而言,我会让解析器将方法与主代码块分开.给定输入:
Personally, I'd let the parser separate the methods from main code block. Given the input:
T add<T> (T a, T b) {
T c = a + b
return c
}
int sub(int x, int y) { return x-y }
add<int>(1, 2)
add<string>('f', 'oo')
然后解析器将产生代表的树:
then the parser would produce the tree that represents:
add<int>(1, 2)
add<string>('f', 'oo')
和代表方法的单独树:
int add (int a, int b) { int c = a + b return c }
string add (string a, string b) { string c = a + b return c }
int sub (int x, int y) { return x-y }
在解析阶段,您只需跟踪两个实例变量中的所有参数(pp)-calls 和-methods:
During the parsing phase, you simply keep track of all parametric parameter (pp henceforth) -calls and -methods in two instance variables:
@parser::members {
private Map<String, Set<Token>> ppMethodCalls = new HashMap<String, Set<Token>>();
private Map<String, CommonTree> ppMethods = new HashMap<String, CommonTree>();
// a separate AST for all methods
public CommonTree methods = new CommonTree(new CommonToken(METHODS, "METHODS"));
// ...
}
解析示例代码后,ppMethodCalls
将保持:
After parsing the example code, ppMethodCalls
would hold:
{"add" => {Token="int", Token="string"}}
和 ppMethods
将保持:
{"add" => Tree=^(METHOD T add ... )}
当解析器完全解析输入源时,调用以下方法:
When the parser has fully parsed the input source, the following method is called:
public void replacePP() {
// iterate over all pp method calls
for(Map.Entry<String, Set<Token>> entry : ppMethodCalls.entrySet()) {
// get the name of the method being called
String name = entry.getKey();
// iterate over all tokens ('int' and 'string', in my example)
for(Token tok : entry.getValue()) {
// get the original pp-method instance
CommonTree ppm = ppMethods.get(name);
// create a deep copy from the original pp-method (will post 'copyTree(...)' later)
CommonTree cp = Main.copyTree(ppm);
// remove the first token from the tree (this is 'T' in my example)
CommonTree pp = (CommonTree)cp.deleteChild(0);
// replace all 'T' tokens with the actual parameter (either 'int' or 'string' in my example)
replace(pp, tok, cp);
// add the rewritten tree to the separate 'methods' tree
methods.addChild(cp);
}
}
}
private void replace(CommonTree param, Token token, CommonTree tree) {
if(tree == null) return;
if(tree.getType() == param.getType() && tree.getText().equals(param.getText())) {
tree.getToken().setType(token.getType());
tree.getToken().setText(token.getText());
}
if(tree.getChildCount() > 0) {
for(Object child : tree.getChildren()) {
replace(param, token, (CommonTree)child);
}
}
}
这将为 add
和 add
创建两个新树并将它们添加到实例中变量:方法:CommonTree
.
which will create two new trees for both add<int>(...)
and add<string>(...)
and add them to the instance variable: methods: CommonTree
.
一个小演示:
grammar PP;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
ROOT;
METHODS;
BLOCK;
ASSIGN;
ARG;
ARGS;
ELIST;
METHOD;
CALL;
}
@parser::header {
import java.util.Map;
import java.util.HashMap;
import java.util.Set;
import java.util.HashSet;
}
@parser::members {
private Map<String, Set<Token>> ppMethodCalls = new HashMap<String, Set<Token>>();
private Map<String, CommonTree> ppMethods = new HashMap<String, CommonTree>();
public CommonTree methods = new CommonTree(new CommonToken(METHODS, "METHODS"));
private void ppCall(String name, Token pp) {
Set<Token> existing = ppMethodCalls.remove(name);
if(existing == null) existing = new HashSet<Token>();
existing.add(pp);
ppMethodCalls.put(name, existing);
}
public void replacePP() {
for(Map.Entry<String, Set<Token>> entry : ppMethodCalls.entrySet()) {
String name = entry.getKey();
for(Token tok : entry.getValue()) {
CommonTree ppm = ppMethods.get(name);
CommonTree cp = Main.copyTree(ppm);
CommonTree pp = (CommonTree)cp.deleteChild(0);
replace(pp, tok, cp);
methods.addChild(cp);
}
}
}
private void replace(CommonTree param, Token token, CommonTree tree) {
if(tree == null) return;
if(tree.getType() == param.getType() && tree.getText().equals(param.getText())) {
tree.getToken().setType(token.getType());
tree.getToken().setText(token.getText());
}
if(tree.getChildCount() > 0) {
for(Object child : tree.getChildren()) {
replace(param, token, (CommonTree)child);
}
}
}
}
parse
: block EOF -> block
;
block
: statement* -> ^(BLOCK statement*)
;
statement
: ppMethod {ppMethods.put($ppMethod.name, $ppMethod.tree);} -> /* omit from main AST */
| normalMethod {methods.addChild($normalMethod.tree);} -> /* omit from main AST */
| methodCall
| assignment
| Return expr -> ^(Return expr)
;
assignment
: type Id '=' expr -> ^(ASSIGN type Id expr)
| Id Id '=' expr -> ^(ASSIGN Id Id expr)
;
normalMethod
: type Id '(' argList ')' '{' block '}' -> ^(METHOD type Id argList block)
;
ppMethod returns [String name]
: tp=Id id=Id {$name=$id.text;} '<' pp=Id '>' '(' ppArgList ')' '{' block '}' -> ^(METHOD $pp $tp $id ppArgList block)
;
methodCall
: Id ('<' type '>' {ppCall($Id.text, $type.tree.getToken());})? '(' exprList ')' -> ^(CALL Id exprList)
;
argList
: (arg (',' arg)*)? -> ^(ARGS arg*)
;
arg
: type Id -> ^(ARG type Id)
;
ppArgList
: (ppArg (',' ppArg)*)? -> ^(ARGS ppArg*)
;
ppArg
: type Id -> ^(ARG type Id)
| Id Id -> ^(ARG Id Id)
;
exprList
: (expr (',' expr)*)? -> ^(ELIST expr*)
;
expr
: atom (('+' | '-')^ atom)*
;
atom
: Int
| Str
| Id
| methodCall
| '(' expr ')' -> expr
;
type
: K_Int
| K_Str
;
Return : 'return';
K_Int : 'int';
K_Str : 'string';
Id : ('a'..'z' | 'A'..'Z') ('a'..'z' | 'A'..'Z' | Digit)*;
Int : Digit+;
Str : '\'' ~'\''* '\'';
Comment : '#' ~('\r' | '\n')* {$channel=HIDDEN;};
Space : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
fragment Digit : '0'..'9';
文件:Main.java
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
String src =
"T add<T> (T a, T b) { \n" +
" T c = a + b \n" +
" return c \n" +
"} \n" +
"int sub(int x, int y) { return x-y } \n" +
"add<int>(1, 2) \n" +
"add<string>('f', 'oo') \n";
PPLexer lexer = new PPLexer(new ANTLRStringStream(src));
PPParser parser = new PPParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.parse().getTree();
parser.replacePP();
System.out.println(new DOTTreeGenerator().toDOT(tree));
System.out.println(new DOTTreeGenerator().toDOT(parser.methods));
}
// put this in a Util-class or else in the parser
public static CommonTree copyTree(CommonTree original) {
CommonTree copy = new CommonTree(original.getToken());
copyTreeRecursive(copy, original);
return copy;
}
private static void copyTreeRecursive(CommonTree copy, CommonTree original) {
if(original.getChildren() != null) {
for(Object o : original.getChildren()) {
CommonTree originalChild = (CommonTree)o;
// get the token from the original child node
CommonToken oTok = (CommonToken)originalChild.getToken();
// create a new token with the same type & text as 'oTok'
CommonToken cTok = new CommonToken(oTok.getType(), oTok.getText());
// copy all attributes from 'oTok' to 'cTok'
cTok.setLine(oTok.getLine());
cTok.setCharPositionInLine(oTok.getCharPositionInLine());
cTok.setChannel(oTok.getChannel());
cTok.setStartIndex(oTok.getStartIndex());
cTok.setStopIndex(oTok.getStopIndex());
cTok.setTokenIndex(oTok.getTokenIndex());
// create a new tree node with the 'cTok' as token
CommonTree copyChild = new CommonTree(cTok);
// set the parent node of the child node
copyChild.setParent(copy);
// add the child to the parent node
copy.addChild(copyChild);
// make a recursive call to copy deeper
copyTreeRecursive(copyChild, originalChild);
}
}
}
}
如果你现在运行主类:
java -cp antlr-3.3.jar org.antlr.Tool PP.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
或 Windows
java -cp antlr-3.3.jar org.antlr.Tool PP.g
javac -cp antlr-3.3.jar *.java
java -cp .;antlr-3.3.jar Main
您将看到以 DOT 代码打印的两棵树:
you will see two trees being printed in DOT-code:
请注意,这只是我整理的一个快速技巧:事情可以稍微整理一下,并且 ppMethod
规则中的 returns [String name]
是不漂亮.它也不适合这样的方法:
Note that this is just a quick hack I put together: things could be tidied up a bit more, and the returns [String name]
in the ppMethod
rule isn't pretty. It also does not cater for methods like this:
A foo<A,B> (A a, B b) {
return A ~ B
}
foo<int,string>(42, 'foo')
也许还有更多的东西.我会把它留给你,希望你能通过上面的小演示走得更远.
and perhaps many more things. I'll leave that for you, hoping you might get further with the little demo above.
这篇关于ANTLR:如何使用树语法替换子树中的特定节点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!