在基本操作中分解表达式:ANTLR + StringTemplate [英] Decompose expression in base operation: ANTLR + StringTemplate

查看:103
本文介绍了在基本操作中分解表达式:ANTLR + StringTemplate的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为类似Java的语言编写多种语言的翻译器.

I am trying to write a translator for a Java like language to multiple languages.

现在我面临2个问题:

首先是按照一系列基本操作分解复杂的表达式,然后将其翻译为目标语言.

First is to decompose complex expression in a sequence of basic operation and then translating them to destination language.

例如,我的起始语言可以是:

For example my starting language can be:

var a = (ln(b) + avg(c))*2

我想像这样翻译它:

var x1 = log_N(b);
var x2 = average(c);
var x3 = sum(x1, x2);
var a = multiply(x3, 2);

我认为我必须使用Tree解析器,但我不确定如何将其与StringTemplate集成. 此外,我添加了额外的变量,例如x1,x2和x3,我不知道如何处理这种情况.

I think i have to use a Tree parser, nut I am not sure how to integrate it with StringTemplate. Moreover i am adding extra variables like x1, x2, and x3 and i don't know how to handle this situation.

第二个问题是我的目标语言之一是类似plsql的语言. 在这种情况下,需要确保所有输出变量都成为游标并将它们传递给函数.

The second problem is that one of my destination language is a plsql like language. In this case a need to ensure that all output variables become cursors and pass them to the functions.

例如表达式:

var a = (ln(b) + avg(c))*2

应以这种方式翻译:

log_N(x1, b);
average(x2, c);
sum(x3, x1, x2);
multiply(a, x3, 2);

其中x1,x2,x3和a将成为输出游标.

where x1, x2, x3 and a will become output cursors.

有人可以帮助我找到正确的解决方案吗?

Can anyone help me finding the right solution?

谢谢

推荐答案

我认为我必须使用Tree解析器,但是我不确定如何将其与StringTemplate集成.

I think i have to use a Tree parser, but I am not sure how to integrate it with StringTemplate.

将StringTemplate集成到树解析器中基本上与将其集成到令牌解析器中相同:将output选项定义为template,然后相应地编写规则生成.

Integrating StringTemplate into a tree parser is basically the same as integrating it into a token parser: define the output option as template, then write the rule productions accordingly.

以下是使用模板进行输出的小树语法.请注意,此语法与我在上一个答案中描述的语法之间唯一有意义的区别是,该语法在树节点上运行而不是令牌.模板的工作原理相同.

Below is a small tree grammar that uses templates for output. Note that the only meaningful difference between this grammar and the one I described in a previous answer is that this one operates on tree nodes rather than tokens. The templates work the same.

tree grammar AstToTemplateParser;

options { 
    output = template;
    tokenVocab = JavaLikeToAst;
    ASTLabelType = CommonTree;
}


program
    : ^(PROGRAM decls+=decl+) -> write(text={$decls})
    ;

decl
    : ^(DECL ID ^(op args+=arg*)) -> assign(name={$ID.text}, op={$op.st}, args={$args})
    | ^(DECL ID ^(CALL method args+=arg*)) -> assign(name={$ID.text}, op={$method.st}, args={$args})
    ;

arg 
    : ID -> write(text={$ID.text})
    | INT -> write(text={$INT.text})
    ;

method
    : ID -> op(name={$ID.text})
    ;

op  : STAR  -> op(name={$STAR.text})
    | DIV   -> op(name={$DIV.text})
    | PLUS  -> op(name={$PLUS.text})
    | MINUS -> op(name={$MINUS.text})
    ;

此外,我要添加x1,x2和x3之类的额外变量,而我不知道如何处理这种情况.

Moreover i am adding extra variables like x1, x2, and x3 and i don't know how to handle this situation.

那是踢脚线.我知道的最简单的方法(这并不容易)是首先使用令牌解析器生成基线AST,然后使用树解析器将AST展平为声明列表,每个声明代表来自原始输入的表达式- x1x2x3在您的问题中.

That's the kicker. The easiest way that I know of (which isn't terribly easy) is to first generate the baseline AST using the token parser then use a tree parser to flatten the AST into a list of declarations, each representing expressions from the original input -- x1, x2, and x3 in your question.

中间阶段如下:

var a = (ln(b) + avg(c))*2

基准AST

 var var0 = log_N(b);
 var var1 = average(c); 
 var var2 = add(var0, var1);
 var a = multiply(var2, 2);

已应用PLSQL模板

  log_N(var0, b);
  average(var1, c);
  add(var2, var0, var1);
  multiply(a, var2, 2);

这是一个令牌解析器语法,它仅生成一个基线AST,该AST与您的问题类似.这里没有真正值得注意的东西,它只是产生典型的AST.

Here's a token parser grammar that simply generates a baseline AST that resembles the one in your question. There's nothing really noteworthy here, it just produces a typical AST.

grammar JavaLikeToAst;

options { 
    output = AST;
}

tokens { 
    PROGRAM; DECL; CALL; 
}

compilationUnit : statement* EOF -> ^(PROGRAM statement*);
statement       : decl;
decl            : VAR ID EQ expr -> ^(DECL ID expr);
expr            : add_expr;
add_expr        : mul_expr ((PLUS|MINUS)^ mul_expr)*;
mul_expr        : call_expr ((STAR|DIV)^ call_expr)*;
call_expr       : ID LPAR arglist? RPAR -> ^(CALL ID arglist?)
                | primary_expr;
arglist         : expr (COMMA! expr)*;
primary_expr    : ID | INT | LPAR! expr RPAR!; 

VAR     : 'var';
ID      : ('a'..'z'|'A'..'Z')('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
INT     : ('0'..'9')+;
COMMA   : ',';
SEMI    : ';';
LCUR    : '{';
RCUR    : '}';
LPAR    : '(';
RPAR    : ')';
EQ      : '=';
PLUS    : '+';
MINUS   : '-';
STAR    : '*';
DIV     : '/';
WS      : (' '|'\t'|'\f'|'\r'|'\n'){skip();};

这是丑陋的地方(至少是我选择这样做的方式;).下面的树语法将上面产生的基线AST转换为与AST的表达式相对应的声明列表-扁平化的AST.在此之下,我将逐步介绍语法的有趣之处.

Here is where it gets ugly (at least the way I choose to do it ;)). The tree grammar below transforms a baseline AST produced from above into a list of declarations that correspond to the expressions of the AST -- the flattened AST. Below this, I'll step through the fun bits of the grammar.

tree grammar AstToFlatAstParser;

options { 
    output = AST;
    tokenVocab = JavaLikeToAst;
    ASTLabelType = CommonTree;
    filter = true;
}

@header { 
    import java.util.HashMap;
}

@members {
    private int currentId = 0;
    private HashMap<Integer, Object> exprs = new HashMap<Integer, Object>();
    private boolean newDecls = false;

    private int nextId() { 
        return currentId++;
    }

    private Object generateId(int id) { 
        return adaptor.create(ID, "var" + id);
    }  

    private void saveExpr(int id, Object expr){
        newDecls = true;
        exprs.put(id, expr);
    }

    private Object buildNewDecls() {
        Object newDecls = adaptor.nil();

        for (int i = 0; i < currentId; ++i){
            if (!exprs.containsKey(i)){
                continue; //This id was generated but not used.
            }

            Object expr = exprs.get(i);
            Object decl = adaptor.create(DECL, tokenNames[DECL]);
            adaptor.addChild(decl, adaptor.create(ID, "var" + i));
            adaptor.addChild(decl, expr);
            adaptor.addChild(newDecls, decl);
        }

        exprs.clear();

        return newDecls;
    }
}

bottomup
    : exit_program
    | exit_op
    ;

exit_op
    @init {
        int myId = nextId();
    }
    : ^(binary_op reduced reduced)
        {$start.parent != null && $start.parent.getType() != DECL}? 
        {saveExpr(myId, $start);} 
        -> {generateId(myId)}
    | ^(CALL ID .*) 
        {$start.parent != null && $start.parent.getType() != DECL}? 
        {saveExpr(myId, $start);} 
        -> {generateId(myId)}
    ;   

binary_op       : STAR | DIV | PLUS | MINUS;

reduced         : ID | INT; 

exit_program
    //Only rebuild PROGRAM if a new declaration is going to be built, that is, when "newDecls" is true.
    //Otherwise PROGRAM is considered changed when it isn't and the processing never ends.
    : {newDecls}? ^(PROGRAM old+=.*) {newDecls = false;} 
        -> ^(PROGRAM {buildNewDecls()} $old*)
    ;

首先,请注意语法主要是Java代码.解析器规则只有五个,其中大多数都很简单.这是一个filter树语法,因此规则bottomuptopdown是入口点.在这种情况下,仅需要bottomup,因此未指定topdown.重复规则bottomup直到输出树不变,这对我们来说意味着没有更多的声明要产生并且树已完全展平.

First, notice that the grammar is mostly Java code. There are only five parser rules, and most of them are simple. This is a filter tree grammar, so rules bottomup and topdown are the entry points. Only bottomup is necessary in this case, so topdown isn't specified. Rule bottomup is repeated until the output tree is unchanged, which to us means when there are no more declarations to produce and the tree is fully flattened.

第二,注意规则exit_program是新声明被写到AST的地方.我正在使用语义谓词({newDecls}?)以确保仅在添加新声明时才修改PROGRAM.还记得我说过bottomup叫什么直到没有更多的改变吗?没有这个语义谓词,exit_program将会总是修改PROGRAM,并且树解析将永远不会停止处理bottomup.对于这种特殊情况,这是一种粗略的解决方法,但是它可以工作.新的声明将插入在PROGRAM的开头,以确保它们在被引用之前就出现了.在预期的十行后定义x1不好.

Second, notice that rule exit_program is where the new declarations are being written out to the AST. I'm using a semantic predicate ({newDecls}?) to ensure that PROGRAM is only modified when new declarations are added. Remember how I said bottomup is called until no more changes are made? Without this semantic predicate, exit_program will always modify PROGRAM and the tree parsing will never stop processing bottomup. This is a crude work-around for this special case but it works. The new declarations are inserted at the beginning of PROGRAM to ensure that they appear before they are referenced. It's no good defining x1 ten lines after it's expected.

第三,请注意,规则exit_op用声明(例如var0)替换了表达式(例如ln(b)).如果满足以下条件之一,则替换表达式:

Third, notice that rule exit_op replaces expressions (like ln(b)) with declarations (like var0). An expression is replaced if one of the following is true:

  • 该表达式是一个二进制运算,其操作数都减少"(即它们都是整数或变量id),并且不是DECL节点的子代. var a = 1 + 2不变,因为1 + 2是声明的子代.之所以更改var b = a + (2 + c)是因为(2 + c)具有两个精简"操作数,并且不是DECL节点的子代(它是a + ...+的子代).

  • The expression is a binary operation whose operands are both "reduced" (that is, they're both integers or variable ids) and is not the child of a DECL node. var a = 1 + 2 is not changed because 1 + 2 is the child of a declaration. var b = a + (2 + c) is changed because (2 + c) has two "reduced" operands and is not the child of a DECL node (it's the child of the + in a + ...).

表达式是一个CALL,它不是DECL节点的子级. var a = ln(b)未被触动,但由于ln(b)+的子代,因此更改了var a = ln(b) + 3.

The expression is a CALL that is not the child of a DECL node. var a = ln(b) is untouched, but var a = ln(b) + 3 is changed because ln(b) is a child of +.

在将表达式替换为ID之前,表达式已存储在exprs中.当规则调用buildNewDecls时,它会在exit_program规则中重构. buildNewDecls仅使用解析器的内置TreeAdaptor成员(名为adaptor)来生成出现在扁平化AST中的DECL节点.适配器方法的Javadoc可以很好地解释调用的作用,因此我将不赘述.

An expression is stored in exprs before it's replaced by an id. It's reconstituted in the exit_program rule when the rule calls buildNewDecls. buildNewDecls just uses the parser's built-in TreeAdaptor member (named adaptor) to generate the DECL nodes that appear in the flattened AST. The Javadoc for the adaptor methods does an adequate job explaining what the calls do, so I won't go into the details.

注意事项:由上面的语法产生的解析器可以很好地处理您所介绍的非常有限的情况.我不知道在将其应用于任何更广泛的场景时,它们将产生什么错误.

Caveat: The parsers produced by the grammars above work fine for the very limited case that you've presented. I don't know what errors they will produce when applied to any broader scenario.

第二个问题是我的目标语言之一是类似plsql的语言.在这种情况下,需要确保所有输出变量都成为游标并将它们传递给函数...

The second problem is that one of my destination language is a plsql like language. In this case a need to ensure that all output variables become cursors and pass them to the functions...

一旦AST只是声明的平面列表,模板就可以为您管理这些东西,如上所示.

That would be something that the templates could manage for you once your AST is just a flat list of declarations, as shown above.

您将把展平的AST传递给基于模板的树解析器(如顶部的树解析器),以生成与您列出的文本版本不同的文本版本.在这种情况下,模板将接受声明的所有部分–变量名称,操作/方法名称以及操作数/参数–并根据所使用的模板生成variable = method(arg0, arg1)method(variable, arg0, arg1)之类的文本.关键是要确保输入是平坦的,并且模板可以接收与声明有关的所有内容.

You'll pass the flattened AST to a template-based tree parser, like the one up top, to produce the different text versions like those you've listed. In this case, a template would accept all the parts of a declaration – the variable name, the operation/method name, and the operands/arguments – and produce text like either variable = method(arg0, arg1) or method(variable, arg0, arg1), depending on the template used. The key is to ensure that the input is flat and that the template receives everything related to the declaration.

这是一个将所有内容捆绑在一起的测试应用程序.

Here is a test application that ties it all together.

public class JavaLikeToAstTest {

    public static void main(String[] args) throws Exception {

        final String code = "var a = (ln(b) + avg(c))*2";

        CharStream input = new ANTLRStringStream(code);
        JavaLikeToAstLexer lexer = new JavaLikeToAstLexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);

        JavaLikeToAstParser parser = new JavaLikeToAstParser(tokens);

        JavaLikeToAstParser.compilationUnit_return result = parser
                .compilationUnit();

        if (lexer.getNumberOfSyntaxErrors() > 0 || parser.getNumberOfSyntaxErrors() > 0) {
            throw new Exception("Syntax Errors encountered!");
        }

        CommonTree tree = (CommonTree) result.tree;

        System.out.printf("Baseline AST: %s%n%n", tree.toStringTree());

        tree = flatten(tree);

        System.out.printf("Flattened AST: %s%n%n", tree.toStringTree());

        translate(tree, "AstToPlsql.stg");
        translate(tree, "AstToGlobal.stg");
    }

    private static CommonTree flatten(CommonTree tree) {
        AstToFlatAstParser parser = new AstToFlatAstParser(
                new CommonTreeNodeStream(tree));
        return (CommonTree) parser.downup(tree, true);
    }

    private static void translate(CommonTree tree, String templateResourceName)
            throws Exception {
        AstToTemplateParser parser = new AstToTemplateParser(
                new CommonTreeNodeStream(tree));
        InputStream stream = JavaLikeToTemplateTest.class
                .getResourceAsStream(templateResourceName);
        Reader reader = new InputStreamReader(stream);
        parser.setTemplateLib(new StringTemplateGroup(reader));
        reader.close();
        stream.close();

        System.out.printf("Result for %s%n%n%s%n%n", templateResourceName,
                parser.program().st.toString());

    }

这里有两个简单的StringTemplate组文件来处理翻译过程.

Here are two simple StringTemplate group files to handle the translation process.

group AstToGlobal;

methods ::= ["*":"multiply", "/":"divide", "+":"add", "-":"subtract", "avg":"average", "ln":"log_N", default:key]

assign(name, op, args) ::= <<var <name> = <op>(<args;separator=", ">) >>

op(name) ::= "<methods.(name)>"

write(text) ::= << <text;separator="\n"> >>

AstToPlsql.stg

group AstToPlsql;

methods ::= ["*":"multiply", "/":"divide", "+":"add", "-":"subtract", "avg":"average", "ln":"log_N", default:key]

assign(name, op, args) ::=<< <op>(<name>, <args;separator=", ">) >>

op(name) ::= "<methods.(name)>"

write(text) ::= << <text;separator="\n"> >>

应用程序产生以下输出:

The application produces the following output:

Baseline AST: (PROGRAM (DECL a (* (+ (CALL ln b) (CALL avg c)) 2)))

(CALL ln b) -> var0
(CALL avg c) -> var1
(+ var0 var1) -> var2
(PROGRAM (DECL a (* var2 2))) -> (PROGRAM (DECL var0 (CALL ln b)) (DECL var1 (CALL avg c)) (DECL var2 (+ var0 var1)) (DECL a (* var2 2)))
Flattened AST: (PROGRAM (DECL var0 (CALL ln b)) (DECL var1 (CALL avg c)) (DECL var2 (+ var0 var1)) (DECL a (* var2 2)))

Result for AstToPlsql.stg

  log_N(var0, b ) 
  average(var1, c ) 
  add(var2, var0 , var1 ) 
  multiply(a, var2 , 2 )  

Result for AstToGlobal.stg

 var var0 = log_N(b ) 
 var var1 = average(c ) 
 var var2 = add(var0 , var1 ) 
 var a = multiply(var2 , 2 )  

AstToTemplate.g中没有代码,也没有用于处理诸如var a = 3之类的简单任务的模板,但是我认为添加代码以使用操作/方法任务作为指导来处理它很容易.

There is no code in AstToTemplate.g or the templates to handle simple assignments like var a = 3, but I think it will be easy enough to add the code to handle it using the op/method assignment as a guide.

这篇关于在基本操作中分解表达式:ANTLR + StringTemplate的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆