使用ANTLR4解析树访问者的代码生成中的变化 [英] variation in code generation, using ANTLR4 parse tree visitors

查看:82
本文介绍了使用ANTLR4解析树访问者的代码生成中的变化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用ANTLR(带有访问者的JavaScript目标)编写转译器( myLang -> JS).
重点是从解析树生成目标代码.
与之类似,如何处理语言源代码的变化.

为使问题更清楚,请考虑以下两种变化-

源#1:
PRINT'你好,那里'

源#2:

  varGreeting ='嘿!打印变量 

在情况1中,我处理字符串.在第2种情况下,它是一个变量.然后,JS目标代码需要有所不同(如下).情况1有引号,情况2无引号.

目标#1(JS):

  console.log(你好,那里");//<-字符串 

目标#2(JS):

  var varGreeting =嘿!";console.log(varGreeting);//<-var 

如何最好地消除歧义并生成不同的代码?我一下子想到了使用规则名称( ID STRLIT )作为不同用法的载体.
但是我找不到这些在RuleContext API中公开.我看了 java ,在JS运行时中.

getText()提供值('hello there' varGreeting ),没有任何我可以利用的元/属性信息.

我深入研究了tree/ctx对象,却没有以容易使用的方式找到它们.

问题:如何在不构建难看的骇客的情况下最好地做到这一点?Transpiler似乎在ANTLR的用例范围内,我是否丢失了某些东西?

(相关部分)语法:

  print:PRINTKW(ID | STRLIT)NEWLINE;文字:"\".*?'\'';ID:[a-zA-Z0-9 _] +; 

访问者覆盖:

 //用于为案例1生成代码的示例代码(带引号)myVisitor.prototype.visitPrint = function(ctx){const Js =`console.log("$ {ctx.getChild(1).getText()}");`;//^^这是案例1和案例2需要进行不同处理的部分//写入文件fs.writeFile(targetFs + fileName +'.js',Js,'utf8',function(err){如果(err)返回console.log(err);console.log(`done`);});返回this.visitChildren(ctx);}; 

使用ANTLR 4.8

解决方案

您正在使用 getChild(1)访问print语句的参数.这将为您提供一个包含 ID STRLIT 令牌的 TerminalNode .您可以使用 getSymbol()方法访问令牌,然后可以使用 .type 属性访问令牌的类型.该类型将是一个数字,您可以将其与 MyLanguageParser.ID MyLanaguageParser.STRLIT 等常量进行比较.

使用 getChild 不一定是访问节点的子级的最佳方法.每个上下文类将为其每个子类有特定的访问器.

具体地说, PrintContext 对象将具有方法 ID() STRLIT().其中一个将返回 null ,另一个将返回一个包含给定令牌的 TerminalNode 对象.因此,通过查看哪一个不为null可以知道它是ID还是字符串文字.

也就是说,更常见的解决方案是在 print 规则中不包含可能的各种参数的并集,而是允许将任何类型的表达式用作 print .然后,您可以在 expression 规则中使用带标签的替代项,以为每种表达式获取不同的访问者方法:

  print:PRINTKW表达式NEWLINE;表达:STRLIT #StringLiteral|ID#变量; 

然后您的访客可能如下所示:

  myVisitor.prototype.visitPrint = function(ctx){const arg = this.visit(ctx.expression());const Js =`console.log($ {arg});`;//写入文件fs.writeFile(targetFs + fileName +'.js',Js,'utf8',function(err){如果(err)返回console.log(err);console.log(`done`);});};myVisitor.prototype.visitStringLiteral = function(ctx){const text = ctx.getText();返回`"$ {text.substring(1,text.length-1)}"``;}myVisitor.prototype.visitVariable = function(ctx){返回ctx.getText();} 

或者,您可以省略标签,而定义一个 visitExpression 方法,通过查看哪个getter返回null来处理这两种情况:

  myVisitor.prototype.visitExpression = function(ctx){如果(ctx.STRLIT!== null){const text = ctx.getText();返回`"$ {text.substring(1,text.length-1)}"``;} 别的 {返回ctx.getText();}} 

PS:请注意,单引号在JavaScript中工作得很好,因此您实际上不需要剥离单引号并将其替换为双引号.在这两种情况下,您都可以只使用 .getText()而不进行任何后处理,并且仍然会作为有效的JavaScript出现.

I am writing transpiler (myLang -> JS) using ANTLR (javascript target with visitor).
Focus is on target code generation part, from the parse tree.
As in, how to deal with language source codes variations.

To make question clearer, consider two variations below -

source#1:
PRINT 'hello there'

source#2:

varGreeting = 'hey!'

PRINT varGreeting

In case 1, I deal with string. While in case 2, it's a variable. JS target code then needs to be different (below). case 1 with quotes, case 2 without.

target#1 (JS):

console.log("hello there");   // <-- string

target#2 (JS):

var varGreeting = "hey!";
console.log(varGreeting);  // <-- var

How can I best disambiguate and generate different code? At once, I thought of using rule name (ID, STRLIT) as bearer of different usages.
But I couldn't find these being exposed in RuleContext API. I looked at java ones, assuming same in JS runtime.

getText() gives value ('hello there', varGreeting), no meta/attribute info that I can leverage.

I digged into the tree/ctx object and didn't find them in easily consumable way.

Question: how to best go about this, without building ugly hacks? Transpiler seems to be in within use case spot of ANTLR, do I missing something?

(relevant part of) Grammar:

print : PRINTKW (ID | STRLIT) NEWLINE;

STRLIT: '\'' .*? '\'' ;
ID    : [a-zA-Z0-9_]+;

Visitor override:

// sample code for generating code for case 1 (with quotes) 
myVisitor.prototype.visitPrint = function(ctx) {


    const Js = 
    `console.log("${ctx.getChild(1).getText()}");`;

    // ^^ this is the part which needs different treatment for case 1 and 2 

    // write to file
    fs.writeFile(targetFs + fileName + '.js', Js, 'utf8', function (err) {
        if (err) return console.log(err);
        console.log(`done`);
      });

  return this.visitChildren(ctx);
};

using ANTLR 4.8

解决方案

You're using getChild(1) to access the argument of the print statement. This will give you a TerminalNode containing either an ID or STRLIT token. You can access the token using the getSymbol() method and you can then access the token's type using the .type property. The type will be a number that you can compare against constants like MyLanguageParser.ID or MyLanaguageParser.STRLIT.

Using getChild isn't necessarily the best way to access a node's children though. Each context class will have specific accessors for each of its children.

Specifically the PrintContext object will have methods ID() and STRLIT(). One of them will return null, the other will return a TerminalNode object containing the given token. So you know whether it was an ID or string literal by seeing which one isn't null.

That said, the more common solution would be to not have a union of possible kinds of arguments in the print rule, but instead allow any kind of expression as an argument to print. You can then use labelled alternatives in your expression rule to get different visitor methods for each kind of expression:

print : PRINTKW expression NEWLINE;

expression
    : STRLIT #StringLiteral
    | ID #Variable
    ;

Then your visitor could look like this:

myVisitor.prototype.visitPrint = function(ctx) {
    const arg = this.visit(ctx.expression());
    const Js = `console.log(${arg});`;

    // write to file
    fs.writeFile(targetFs + fileName + '.js', Js, 'utf8', function (err) {
        if (err) return console.log(err);
        console.log(`done`);
    });
};

myVisitor.prototype.visitStringLiteral = function(ctx) {
    const text = ctx.getText();
    return `"${text.substring(1, text.length - 1)}"`;
}

myVisitor.prototype.visitVariable = function(ctx) {
    return ctx.getText();
}

Alternatively you could leave out the labels and instead define a visitExpression method that handles both cases by seeing which getter returns null:

myVisitor.prototype.visitExpression = function(ctx) {
    if (ctx.STRLIT !== null) {
        const text = ctx.getText();
        return `"${text.substring(1, text.length - 1)}"`;
    } else {
        return ctx.getText();
    }
}

PS: Do note that single quotes work just fine in JavaScript, so you don't actually need to strip the single quotes and replace them with double quotes. You could just use .getText() without any post-processing in both cases and that'd still come out as valid JavaScript.

这篇关于使用ANTLR4解析树访问者的代码生成中的变化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆