来自 ANTLR 解析树的 Python AST? [英] Python AST from ANTLR Parse Tree?

查看:23
本文介绍了来自 ANTLR 解析树的 Python AST?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现了一个 ANTLRv4 Python3 语法,但它生成了一个解析树,它通常有许多无用的节点.

I found an ANTLRv4 Python3 grammer, but it generates a parse-tree, which generally has many useless nodes.

我正在寻找一个已知的包来从该解析树中获取 Python AST.

I'm looking for a known package to get a Python AST from that parse tree.

这样的东西存在吗?

关于使用 Python ast 包的说明:我的项目使用 Java,我需要解析 Python 文件.

Clarification regarding use of the Python ast package: my project is in Java and I need to parse Python files.

编辑 2:AST"是指 http://docs.python.org/2/library/ast.html#abstract-grammar,而解析树"是指http://docs.python.org/2/reference/grammar.html.

EDIT 2: By 'AST' I mean http://docs.python.org/2/library/ast.html#abstract-grammar, while by 'parse tree' I mean http://docs.python.org/2/reference/grammar.html.

推荐答案

以下可能是一个开始:

public class AST {

    private final Object payload;

    private final List<AST> children;

    public AST(ParseTree tree) {
        this(null, tree);
    }

    private AST(AST ast, ParseTree tree) {
        this(ast, tree, new ArrayList<AST>());
    }

    private AST(AST parent, ParseTree tree, List<AST> children) {

        this.payload = getPayload(tree);
        this.children = children;

        if (parent == null) {
            walk(tree, this);
        }
        else {
            parent.children.add(this);
        }
    }

    public Object getPayload() {
        return payload;
    }

    public List<AST> getChildren() {
        return new ArrayList<>(children);
    }

    private Object getPayload(ParseTree tree) {
        if (tree.getChildCount() == 0) {
            return tree.getPayload();
        }
        else {
            String ruleName = tree.getClass().getSimpleName().replace("Context", "");
            return Character.toLowerCase(ruleName.charAt(0)) + ruleName.substring(1);
        }
    }

    private static void walk(ParseTree tree, AST ast) {

        if (tree.getChildCount() == 0) {
            new AST(ast, tree);
        }
        else if (tree.getChildCount() == 1) {
            walk(tree.getChild(0), ast);
        }
        else if (tree.getChildCount() > 1) {

            for (int i = 0; i < tree.getChildCount(); i++) {

                AST temp = new AST(ast, tree.getChild(i));

                if (!(temp.payload instanceof Token)) {
                    walk(tree.getChild(i), temp);
                }
            }
        }
    }

    @Override
    public String toString() {

        StringBuilder builder = new StringBuilder();

        AST ast = this;
        List<AST> firstStack = new ArrayList<>();
        firstStack.add(ast);

        List<List<AST>> childListStack = new ArrayList<>();
        childListStack.add(firstStack);

        while (!childListStack.isEmpty()) {

            List<AST> childStack = childListStack.get(childListStack.size() - 1);

            if (childStack.isEmpty()) {
                childListStack.remove(childListStack.size() - 1);
            }
            else {
                ast = childStack.remove(0);
                String caption;

                if (ast.payload instanceof Token) {
                    Token token = (Token) ast.payload;
                    caption = String.format("TOKEN[type: %s, text: %s]",
                            token.getType(), token.getText().replace("
", "\n"));
                }
                else {
                    caption = String.valueOf(ast.payload);
                }

                String indent = "";

                for (int i = 0; i < childListStack.size() - 1; i++) {
                    indent += (childListStack.get(i).size() > 0) ? "|  " : "   ";
                }

                builder.append(indent)
                        .append(childStack.isEmpty() ? "'- " : "|- ")
                        .append(caption)
                        .append("
");

                if (ast.children.size() > 0) {
                    List<AST> children = new ArrayList<>();
                    for (int i = 0; i < ast.children.size(); i++) {
                        children.add(ast.children.get(i));
                    }
                    childListStack.add(children);
                }
            }
        }

        return builder.toString();
    }
}

并且可用于为输入"f(arg1='1') "创建一个AST,如下所示:

and can be used to create an AST for the input "f(arg1='1') " as follows:

public static void main(String[] args) {

    Python3Lexer lexer = new Python3Lexer(new ANTLRInputStream("f(arg1='1')
"));
    Python3Parser parser = new Python3Parser(new CommonTokenStream(lexer));

    ParseTree tree = parser.file_input();
    AST ast = new AST(tree);

    System.out.println(ast);
}

将打印:

'- file_input
   |- stmt
   |  |- small_stmt
   |  |  |- atom
   |  |  |  '- TOKEN[type: 35, text: f]
   |  |  '- trailer
   |  |     |- TOKEN[type: 47, text: (]
   |  |     |- arglist
   |  |     |  |- test
   |  |     |  |  '- TOKEN[type: 35, text: arg1]
   |  |     |  |- TOKEN[type: 53, text: =]
   |  |     |  '- test
   |  |     |     '- TOKEN[type: 36, text: '1']
   |  |     '- TOKEN[type: 48, text: )]
   |  '- TOKEN[type: 34, text: 
]
   '- TOKEN[type: -1, text: ]

我意识到这仍然包含您可能不想要的节点,但您甚至可以添加一组您想要排除的令牌类型.随意破解!

I realize this still contains nodes you might not want, but you could even add a set of token types you'd like to exclude. Feel free to hack away!

这里是一个要点,其中包含上述代码的一个版本,其中包含正确的导入语句和一些JavaDocs 和内联注释.

Here is a Gist containing a version of the code above with the proper import statements and some JavaDocs and inline comments.

这篇关于来自 ANTLR 解析树的 Python AST?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆