语法完成后,遍历 ANTLR v4 树的最佳方法是什么? [英] Once grammar is complete, what's the best way to walk an ANTLR v4 tree?

查看:28
本文介绍了语法完成后,遍历 ANTLR v4 树的最佳方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目标

我正在开展一个项目,为 Coldfusion CFscript 创建 Varscoper.基本上,这意味着检查源代码文件以确保开发人员已正确var 对他们的变量进行了处理.

I'm working on a project to create a Varscoper for Coldfusion CFscript. Basically, this means checking through source code files to ensure that developers have properly var'd their variables.

在使用 ANTLR V4 几天后,我有一个语法可以在 GUI 视图中生成一个非常好的解析树.现在,使用该树,我需要一种方法来以编程方式在节点上上下爬行以查找变量声明,并确保如果它们在函数内部,则它们具有正确的作用域.如果可能,我宁愿不在语法文件中执行此操作,因为这需要将语言定义与此特定任务混合在一起.

After a couple of days of working with ANTLR V4 I have a grammar which generates a very nice parse tree in the GUI view. Now, using that tree I need a way to crawl up and down the nodes programmatically looking for variable declarations and ensure that if they are inside functions they have the proper scoping. If possible I would rather NOT do this in the grammar file as that would require mixing the definition of the language with this specific task.

我的尝试

我最近的尝试是使用 ParserRuleContext 并尝试通过 getPayload() 遍历它的 children.在检查了 getPayLoad() 的类之后,我将有一个 ParserRuleContext 对象或一个 Token 对象.不幸的是,使用它我永远无法找到一种方法来获取特定节点的实际规则类型,只有它包含文本.每个节点的规则类型是必要的,因为该文本节点是被忽略的右手表达式、变量赋值还是函数声明很重要.

My latest attempt was using the ParserRuleContext and attempting to go through it's children via getPayload(). After checking the class of getPayLoad() I would either have a ParserRuleContext object or a Token object. Unfortunately, using that I was never able to find a way to get the actual rule type for a specific node, only it's containing text. The rule type for each node is neccessary because it matters whether that text node is an ignored right-hand expression, a variable assignment or a function declaration.

问题

  1. 我对 ANTLR 非常陌生,这是否是正确的方法,或者是否有更好的方法来遍历树?

这是我的示例 Java 代码:

Here's my sample java code:

Cfscript.java

Cfscript.java

import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.Trees;

public class Cfscript {
    public static void main(String[] args) throws Exception {
        ANTLRInputStream input = new ANTLRFileStream(args[0]);
        CfscriptLexer lexer = new CfscriptLexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        CfscriptParser parser = new CfscriptParser(tokens);
        parser.setBuildParseTree(true);
        ParserRuleContext tree = parser.component();
        tree.inspect(parser); // show in gui
        /*
            Recursively go though tree finding function declarations and ensuring all variableDeclarations are varred
            but how?
        */
    }
}

Cfscript.g4

Cfscript.g4

grammar Cfscript;

component
    : 'component' keyValue* '{' componentBody '}'
    ;

componentBody
    : (componentElement)*
    ;

componentElement
    : statement
    | functionDeclaration
    ;

functionDeclaration
    : Identifier? Identifier? 'function' Identifier argumentsDefinition '{' functionBody '}'
    ;

argumentsDefinition
    : '(' argumentDefinition (',' argumentDefinition)* ')'
    | '()'
    ;

argumentDefinition
    : Identifier? Identifier? argumentName ('=' expression)?
    ; 

argumentName
    : Identifier
    ;

functionBody
    : (statement)*
    ;

statement
    : variableStatement
    | nonVarVariableStatement
    | expressionStatement
    ;

variableStatement
    : 'var' variableName '=' expression ';'
    ;

nonVarVariableStatement
    : variableName '=' expression ';'
    ;

expressionStatement
    : expression ';'
    ;

expression
    : assignmentExpression
    | arrayLiteral
    | objectLiteral
    | StringLiteral
    | incrementExpression
    | decrementExpression
    | 'true' 
    | 'false'
    | Identifier
    ;

incrementExpression
    : variableName '++'
    ;

decrementExpression
    : variableName '--'
    ;

assignmentExpression
    : Identifier (assignmentExpressionSuffix)*
    | assignmentExpression (('+'|'-'|'/'|'*') assignmentExpression)+
    ;

assignmentExpressionSuffix
    : '.' assignmentExpression
    | ArrayIndex
    | ('()' | '(' expression (',' expression)* ')' )
    ;

methodCall
    : Identifier ('()' | '(' expression (',' expression)* ')' )
    ;

variableName
    : Identifier (variableSuffix)*
    ;

variableSuffix
    : ArrayIndex
    | '.' variableName
    ;

arrayLiteral
    : '[' expression (',' expression)* ']'
    ;

objectLiteral
    : '{' (Identifier '=' expression (',' Identifier '=' expression)*)? '}'
    ;

keyValue
    : Identifier '=' StringLiteral
    ;

StringLiteral
    :  '"' (~('\\'|'"'))* '"'
    ;

 ArrayIndex
    : '[' [1-9] [0-9]* ']'
    | '[' StringLiteral ']'
    ;

Identifier
    : [a-zA-Z0-9]+
    ;

WS
    : [ \t\r\n]+ -> skip 
    ;

COMMENT 
    : '/*' .*? '*/'  -> skip
    ;

Test.cfc(测试代码文件)

Test.cfc (testing code file)

component something = "foo" another = "more" persistent = "true" datasource = "#application.env.dsn#" {
    var method = something.foo.test1;
    testing = something.foo[10];
    testingagain = something.foo["this is a test"];
    nuts["testing"]++;
    blah.test().test3["test"]();

    var math = 1 + 2 - blah.test().test4["test"];

    var test = something;
    var testing = somethingelse;
    var testing = { 
        test = more, 
        mystuff = { 
            interior = test 
        },
        third = "third key"
    };
    other = "Idunno homie";
    methodCall(interiorMethod());

    public function bar() {
        var new = "somebody i used to know";
        something = [1, 2, 3];
    }

    function nuts(required string test1 = "first", string test = "second", test3 = "third") {

    }

    private boolean function baz() {
        var this = "something else";
    }
}

推荐答案

如果我是你,我不会手动走这条路.生成词法分析器和解析器后,ANTLR 还会生成一个名为 CfscriptBaseListener 的文件,该文件包含所有解析器规则的空方法.您可以让 ANTLR 遍历您的树并附加一个自定义树侦听器,您可以在其中仅覆盖您感兴趣的那些方法/规则.

I wouldn't walk this manually if I were you. After generating a lexer and parser, ANTLR would also have generated a file called CfscriptBaseListener that has empty methods for all of your parser rules. You can let ANTLR walk your tree and attach a custom tree-listener in which you override only those methods/rules you're interested in.

在您的情况下,您可能希望在创建新函数(以创建新范围)时收到通知,并且您可能会对变量赋值(variableStatementnonVarVariableStatement).您的侦听器,让我们调用 VarListener 将在 ANTLR 遍历树时跟踪所有范围.

In your case, you probably want to be notified whenever a new function is created (to create a new scope) and you'll probably be interested in variable assignments (variableStatement and nonVarVariableStatement). Your listener, let's call is VarListener will keep track of all scopes as ANTLR walks the tree.

我确实稍微更改了 1 条规则(我添加了 objectLiteralEntry):

I did change 1 rule slightly (I added objectLiteralEntry):

objectLiteral
    : '{' (objectLiteralEntry (',' objectLiteralEntry)*)? '}'
    ;

objectLiteralEntry
    : Identifier '=' expression
    ;
    

在以下演示中使生活更轻松:

which makes life easier in the following demo:

public class VarListener extends CfscriptBaseListener {

    private Stack<Scope> scopes;

    public VarListener() {
        scopes = new Stack<Scope>();
        scopes.push(new Scope(null));
    } 

    @Override
    public void enterVariableStatement(CfscriptParser.VariableStatementContext ctx) {
        String varName = ctx.variableName().getText();
        Scope scope = scopes.peek();
        scope.add(varName);
    }

    @Override
    public void enterNonVarVariableStatement(CfscriptParser.NonVarVariableStatementContext ctx) {
        String varName = ctx.variableName().getText();
        checkVarName(varName);
    }

    @Override
    public void enterObjectLiteralEntry(CfscriptParser.ObjectLiteralEntryContext ctx) {
        String varName = ctx.Identifier().getText();
        checkVarName(varName);
    }

    @Override
    public void enterFunctionDeclaration(CfscriptParser.FunctionDeclarationContext ctx) {
        scopes.push(new Scope(scopes.peek()));
    }

    @Override
    public void exitFunctionDeclaration(CfscriptParser.FunctionDeclarationContext ctx) {
        scopes.pop();        
    }

    private void checkVarName(String varName) {
        Scope scope = scopes.peek();
        if(scope.inScope(varName)) {
            System.out.println("OK   : " + varName);
        }
        else {
            System.out.println("Oops : " + varName);
        }
    }
}

Scope 对象可以很简单:

class Scope extends HashSet<String> {

    final Scope parent;

    public Scope(Scope parent) {
        this.parent = parent;
    }

    boolean inScope(String varName) {
        if(super.contains(varName)) {
            return true;
        }
        return parent == null ? false : parent.inScope(varName);
    }
}

现在,为了测试这一切,这里有一个小的主类:

Now, to test this all, here's a small main class:

import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.*;

public class Main {

    public static void main(String[] args) throws Exception {

        CfscriptLexer lexer = new CfscriptLexer(new ANTLRFileStream("Test.cfc"));
        CfscriptParser parser = new CfscriptParser(new CommonTokenStream(lexer));
        ParseTree tree = parser.component();
        ParseTreeWalker.DEFAULT.walk(new VarListener(), tree);
    }
}

如果你运行这个 Main 类,将打印以下内容:

If you run this Main class, the following will be printed:

Oops : testing
Oops : testingagain
OK   : test
Oops : mystuff
Oops : interior
Oops : third
Oops : other
Oops : something

毫无疑问,这不是您想要的,我可能弄错了一些 Coldfusion 的范围规则.但我认为这会让您对如何正确解决问题有所了解.我认为代码是非常自我解释的,但如果不是这种情况,请不要犹豫,要求澄清.

Without a doubt, this is not exactly what you want and I probably goofed up some scoping rules of Coldfusion. But I think this will give you some insight in how to solve your problem properly. I think the code is pretty self explanatory, but if this is not the case, don't hesitate to ask for clarification.

HTH

这篇关于语法完成后,遍历 ANTLR v4 树的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆