一旦语法完成,什么是最好的方式走一个ANTLR v4树? [英] Once grammar is complete, what's the best way to walk an ANTLR v4 tree?

查看:1142
本文介绍了一旦语法完成,什么是最好的方式走一个ANTLR v4树?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目标



我正在开发一个项目来为Coldfusion CFscript创建一个Varscoper。基本上,这意味着检查源代码文件,以确保开发人员正确地 var 它们的变量。



几天的工作与ANTLR V4我有一个语法,生成一个非常好的分析树在GUI视图。现在,使用那棵树我需要一种方法来爬行上下节点通过编程方式寻找变量声明,并确保如果他们在函数内部,他们有适当的范围。如果可能,我宁愿不在语法文件中这样做,因为这将需要混合语言的定义与这个特定的任务。



我试过



我最近的尝试是使用 ParserRuleContext ,尝试通过 children 通过 getPayload()。检查 getPayLoad()的类后,我将有一个 ParserRuleContext 对象或令牌对象。不幸的是,使用我从来没有能够找到一种方法来获得一个特定节点的实际规则类型,只有它包含文本。每个节点的规则类型是必需的,因为它是重要的,无论文本节点是忽略的右手表达式,变量赋值还是函数声明。



问题


  1. 我是ANTLR的新手,是这个甚至是正确的方法,还是有更好的方法来遍历树?


这是我的示例java代码:



Cfscript.java



  import org.antlr.v4.runtime。*; 
import org.antlr.v4.runtime.tree.Trees;

public class Cfscript {
public static void main(String [] args)throws Exception {
ANTLRInputStream input = new ANTLRFileStream(args [0]);
CfscriptLexer lexer = new CfscriptLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
CfscriptParser parser = new CfscriptParser(tokens);
parser.setBuildParseTree(true);
ParserRuleContext tree = parser.component();
tree.inspect(parser); // show in gui
/ *
递归地遍历树形查找函数声明并确保所有的变量声明varred
但是如何?
* /
}
}

Cfscript.g4



 语法Cfscript; 

component
:'component'keyValue *'{'componentBody'}'
;

componentBody
:(componentElement)*
;

componentElement
:statement
| functionDeclaration
;

functionDeclaration
:标识符?标识符? 'function'标识符argumentsDefinition'{'functionBody'}'
;

argumentsDefinition
:'('argumentDefinition(','argumentDefinition)*')'
| '()'
;

argumentDefinition
:标识符?标识符? argumentName('='expression)?
;

argumentName
:标识符
;

functionBody
:(statement)*
;

语句
:variableStatement
| nonVarVariableStatement
| expressionStatement
;

variableStatement
:'var'variableName'='expression';'
;

nonVarVariableStatement
:variableName'='expression';'
;

expressionStatement
:expression';'
;

表达式
:assignmentExpression
| arrayLiteral
| objectLiteral
| StringLiteral
| incrementExpression
| decrementExpression
| 'true'
| 'false'
|标识符
;

incrementExpression
:variableName'++'
;

decrementExpression
:variableName' - '
;

assignExpression
:标识符(assignmentExpressionSuffix)*
| assignmentExpression(('+'|' - '|'/'|'*')assignmentExpression)+
;

assignExpressionSuffix
:'。'assignmentExpression
| ArrayIndex
| ('()'|'('expression(','expression)*')')
;

methodCall
:标识符('()'|'('expression(','expression)*')')
;

variableName
:标识符(variableSuffix)*
;

variableSuffix
:ArrayIndex
| '。'variableName
;

arrayLiteral
:'['expression(','expression)*']'
;

objectLiteral
:'{'(Identifier'='expression(','Identifier'='expression)*)? '}'
;

keyValue
:标识符'='StringLiteral
;

StringLiteral
:'''(〜('\\'|''))*''
;

ArrayIndex
:'['[1-9] [0-9] *']'
|'['StringLiteral']'
;

标识符
:[a-zA-Z0-9] +
;

WS
:[\t\r\\\
] + - > skip
;

COMMENT
:'/ *'。*?'* /' - > skip
;
pre>

Test.cfc(测试代码文件)

  =fooanother =morepersistent =truedatasource =#application.env.dsn#{
var method = something.foo.test1;
testing = something.foo [10 ];
testagain = something.foo [this is a test];
nuts [testing] ++;
blah.test()。test3 [test] );

var math = 1 + 2 - blah.test()。test4 [test];

var test = something;
var testing = somethingelse;
var testing = {
test = more,
mystuff = {
interior = test
},
third =third key
};
other =Idunno homie;
methodCall(interiorMethod());

public function bar(){
var new =某人我知道;
something = [1,2,3];
}

函数nut(必需字符串test1 =first,string test =second,test3 =third){

} $ b b
私人布尔函数baz(){
var this =something else;
}
}


解决方案

I如果我是你,不会手动走这个。生成词法分析器和解析器后,ANTLR也会生成一个名为 CfscriptBaseListener 的文件,该文件对所有解析器规则都有空方法。你可以让ANTLR走你的树,并附上一个自定义的树监听器,你只需覆盖你感兴趣的那些方法/规则。



在你的情况下,你可能想要在创建一个新函数时被通知(创建一个新作用域),你可能会对变量赋值感兴趣( variableStatement nonVarVariableStatement )。您的监听器,让我们调用 VarListener 将跟踪ANTLR走过树的所有范围。



1规则稍微(我添加 objectLiteralEntry ):

 objectLiteral 
:'{' objectLiteralEntry(','objectLiteralEntry)*)? '}'
;

objectLiteralEntry
:标识符'='表达式
;



这将使下面的演示变得更容易:



VarListener.java



public class VarListener extends CfscriptBaseListener {

private Stack< Scope>范围;

public VarListener(){
scopes = new Stack< Scope>();
scopes.push(new Scope(null));
}

@Override
public void enterVariableStatement(CfscriptParser.VariableStatementContext ctx){
String varName = ctx.variableName()。getText();
范围scope = scopes.peek();
scope.add(varName);
}

@Override
public void enterNonVarVariableStatement(CfscriptParser.NonVarVariableStatementContext ctx){
String varName = ctx.variableName()。getText();
checkVarName(varName);
}

@Override
public void enterObjectLiteralEntry(CfscriptParser.ObjectLiteralEntryContext ctx){
String varName = ctx.Identifier()。getText();
checkVarName(varName);
}

@Override
public void enterFunctionDeclaration(CfscriptParser.FunctionDeclarationContext ctx){
scopes.push(new Scope(scopes.peek()));
}

@Override
public void exitFunctionDeclaration(CfscriptParser.FunctionDeclarationContext ctx){
scopes.pop();
}

private void checkVarName(String varName){
Scope scope = scopes.peek();
if(scope.inScope(varName)){
System.out.println(OK:+ varName);
}
else {
System.out.println(Oops:+ varName);
}
}
}

A Scope 对象可以如下简单:



Scope.java



  class Scope extends HashSet< String> {

final Scope parent;

public Sc​​ope(Scope parent){
this.parent = parent;
}

boolean inScope(String varName){
if(super.contains(varName)){
return true;
}
return parent == null? false:parent.inScope(varName);
}
}

现在,为了测试这一切,类:



Main.java



  import org.antlr.v4.runtime。*; 
import org.antlr.v4.runtime.tree。*;

public class Main {

public static void main(String [] args)throws Exception {

CfscriptLexer lexer = new CfscriptLexer(new ANTLRFileStream Test.cfc));
CfscriptParser parser = new CfscriptParser(new CommonTokenStream(lexer));
ParseTree tree = parser.component();
ParseTreeWalker.DEFAULT.walk(new VarListener(),tree);
}
}

如果运行 Main 类,将打印以下内容:

 Oops:testing 
Oops:testingagain
OK:test
哎呀:mystuff
哎呀:内部
哎呀:第三个
哎呀:其他
哎呀,东西

毫无疑问,这不是你想要的,我可能会想出Coldfusion的一些范围规则。但我认为这将给你一些洞察如何正确地解决你的问题。我认为代码是非常自我解释,但如果不是这样,请不要犹豫要求澄清。



HTH


Goal

I'm working on a project to create a Varscoper for Coldfusion CFscript. Basically, this means checking through source code files to ensure that developers have properly var'd their variables.

After a couple of days of working with ANTLR V4 I have a grammar which generates a very nice parse tree in the GUI view. Now, using that tree I need a way to crawl up and down the nodes programmatically looking for variable declarations and ensure that if they are inside functions they have the proper scoping. If possible I would rather NOT do this in the grammar file as that would require mixing the definition of the language with this specific task.

What I've tried

My latest attempt was using the ParserRuleContext and attempting to go through it's children via getPayload(). After checking the class of getPayLoad() I would either have a ParserRuleContext object or a Token object. Unfortunately, using that I was never able to find a way to get the actual rule type for a specific node, only it's containing text. The rule type for each node is neccessary because it matters whether that text node is an ignored right-hand expression, a variable assignment or a function declaration.

Questions

  1. I am very new to ANTLR, is this even the right approach, or is there a better way to traverse the tree?

Here's my sample java code:

Cfscript.java

import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.Trees;

public class Cfscript {
    public static void main(String[] args) throws Exception {
        ANTLRInputStream input = new ANTLRFileStream(args[0]);
        CfscriptLexer lexer = new CfscriptLexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        CfscriptParser parser = new CfscriptParser(tokens);
        parser.setBuildParseTree(true);
        ParserRuleContext tree = parser.component();
        tree.inspect(parser); // show in gui
        /*
            Recursively go though tree finding function declarations and ensuring all variableDeclarations are varred
            but how?
        */
    }
}

Cfscript.g4

grammar Cfscript;

component
    : 'component' keyValue* '{' componentBody '}'
    ;

componentBody
    : (componentElement)*
    ;

componentElement
    : statement
    | functionDeclaration
    ;

functionDeclaration
    : Identifier? Identifier? 'function' Identifier argumentsDefinition '{' functionBody '}'
    ;

argumentsDefinition
    : '(' argumentDefinition (',' argumentDefinition)* ')'
    | '()'
    ;

argumentDefinition
    : Identifier? Identifier? argumentName ('=' expression)?
    ; 

argumentName
    : Identifier
    ;

functionBody
    : (statement)*
    ;

statement
    : variableStatement
    | nonVarVariableStatement
    | expressionStatement
    ;

variableStatement
    : 'var' variableName '=' expression ';'
    ;

nonVarVariableStatement
    : variableName '=' expression ';'
    ;

expressionStatement
    : expression ';'
    ;

expression
    : assignmentExpression
    | arrayLiteral
    | objectLiteral
    | StringLiteral
    | incrementExpression
    | decrementExpression
    | 'true' 
    | 'false'
    | Identifier
    ;

incrementExpression
    : variableName '++'
    ;

decrementExpression
    : variableName '--'
    ;

assignmentExpression
    : Identifier (assignmentExpressionSuffix)*
    | assignmentExpression (('+'|'-'|'/'|'*') assignmentExpression)+
    ;

assignmentExpressionSuffix
    : '.' assignmentExpression
    | ArrayIndex
    | ('()' | '(' expression (',' expression)* ')' )
    ;

methodCall
    : Identifier ('()' | '(' expression (',' expression)* ')' )
    ;

variableName
    : Identifier (variableSuffix)*
    ;

variableSuffix
    : ArrayIndex
    | '.' variableName
    ;

arrayLiteral
    : '[' expression (',' expression)* ']'
    ;

objectLiteral
    : '{' (Identifier '=' expression (',' Identifier '=' expression)*)? '}'
    ;

keyValue
    : Identifier '=' StringLiteral
    ;

StringLiteral
    :  '"' (~('\\'|'"'))* '"'
    ;

 ArrayIndex
    : '[' [1-9] [0-9]* ']'
    | '[' StringLiteral ']'
    ;

Identifier
    : [a-zA-Z0-9]+
    ;

WS
    : [ \t\r\n]+ -> skip 
    ;

COMMENT 
    : '/*' .*? '*/'  -> skip
    ;

Test.cfc (testing code file)

component something = "foo" another = "more" persistent = "true" datasource = "#application.env.dsn#" {
    var method = something.foo.test1;
    testing = something.foo[10];
    testingagain = something.foo["this is a test"];
    nuts["testing"]++;
    blah.test().test3["test"]();

    var math = 1 + 2 - blah.test().test4["test"];

    var test = something;
    var testing = somethingelse;
    var testing = { 
        test = more, 
        mystuff = { 
            interior = test 
        },
        third = "third key"
    };
    other = "Idunno homie";
    methodCall(interiorMethod());

    public function bar() {
        var new = "somebody i used to know";
        something = [1, 2, 3];
    }

    function nuts(required string test1 = "first", string test = "second", test3 = "third") {

    }

    private boolean function baz() {
        var this = "something else";
    }
}

解决方案

I wouldn't walk this manually if I were you. After generating a lexer and parser, ANTLR would also have generated a file called CfscriptBaseListener that has empty methods for all of your parser rules. You can let ANTLR walk your tree and attach a custom tree-listener in which you override only those methods/rules you're interested in.

In your case, you probably want to be notified whenever a new function is created (to create a new scope) and you'll probably be interested in variable assignments (variableStatement and nonVarVariableStatement). Your listener, let's call is VarListener will keep track of all scopes as ANTLR walks the tree.

I did change 1 rule slightly (I added objectLiteralEntry):

objectLiteral
    : '{' (objectLiteralEntry (',' objectLiteralEntry)*)? '}'
    ;

objectLiteralEntry
    : Identifier '=' expression
    ;
    

which makes life easier in the following demo:

VarListener.java

public class VarListener extends CfscriptBaseListener {

    private Stack<Scope> scopes;

    public VarListener() {
        scopes = new Stack<Scope>();
        scopes.push(new Scope(null));
    } 

    @Override
    public void enterVariableStatement(CfscriptParser.VariableStatementContext ctx) {
        String varName = ctx.variableName().getText();
        Scope scope = scopes.peek();
        scope.add(varName);
    }

    @Override
    public void enterNonVarVariableStatement(CfscriptParser.NonVarVariableStatementContext ctx) {
        String varName = ctx.variableName().getText();
        checkVarName(varName);
    }

    @Override
    public void enterObjectLiteralEntry(CfscriptParser.ObjectLiteralEntryContext ctx) {
        String varName = ctx.Identifier().getText();
        checkVarName(varName);
    }

    @Override
    public void enterFunctionDeclaration(CfscriptParser.FunctionDeclarationContext ctx) {
        scopes.push(new Scope(scopes.peek()));
    }

    @Override
    public void exitFunctionDeclaration(CfscriptParser.FunctionDeclarationContext ctx) {
        scopes.pop();        
    }

    private void checkVarName(String varName) {
        Scope scope = scopes.peek();
        if(scope.inScope(varName)) {
            System.out.println("OK   : " + varName);
        }
        else {
            System.out.println("Oops : " + varName);
        }
    }
}

A Scope object could be as simple as:

Scope.java

class Scope extends HashSet<String> {

    final Scope parent;

    public Scope(Scope parent) {
        this.parent = parent;
    }

    boolean inScope(String varName) {
        if(super.contains(varName)) {
            return true;
        }
        return parent == null ? false : parent.inScope(varName);
    }
}

Now, to test this all, here's a small main class:

Main.java

import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.*;

public class Main {

    public static void main(String[] args) throws Exception {

        CfscriptLexer lexer = new CfscriptLexer(new ANTLRFileStream("Test.cfc"));
        CfscriptParser parser = new CfscriptParser(new CommonTokenStream(lexer));
        ParseTree tree = parser.component();
        ParseTreeWalker.DEFAULT.walk(new VarListener(), tree);
    }
}

If you run this Main class, the following will be printed:

Oops : testing
Oops : testingagain
OK   : test
Oops : mystuff
Oops : interior
Oops : third
Oops : other
Oops : something

Without a doubt, this is not exactly what you want and I probably goofed up some scoping rules of Coldfusion. But I think this will give you some insight in how to solve your problem properly. I think the code is pretty self explanatory, but if this is not the case, don't hesitate to ask for clarification.

HTH

这篇关于一旦语法完成,什么是最好的方式走一个ANTLR v4树?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆