代码生成的变化,使用 ANTLR4 解析树访问者 [英] variation in code generation, using ANTLR4 parse tree visitors

查看:25
本文介绍了代码生成的变化,使用 ANTLR4 解析树访问者的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 ANTLR(带有访问者的 JavaScript 目标)编写转译器 (myLang -> JS).
重点是来自解析树的目标代码生成部分.
如,如何处理语言源代码的变化.

为了使问题更清楚,请考虑以下两种变体 -

来源#1:
打印'你好'

来源#2:

varGreeting = '嘿!'PRINT varGreeting

在第一种情况下,我处理字符串.而在情况 2 中,它是一个变量.JS 目标代码则需要不同(如下).案例 1 带引号,案例 2 不带.

目标#1 (JS):

console.log("你好");//<-- 字符串

目标#2 (JS):

var varGreeting = "嘿!";控制台日志(varGreeting);//<-- 变量

我怎样才能最好地消除歧义并生成不同的代码?一下子想到用规则名(IDSTRLIT)作为不同用法的承载.
但是我找不到这些在 RuleContext API 中公开的内容.我查看了 java 的,假设相同在 JS 运行时.

getText() 给出值('hello there', varGreeting),没有我可以利用的元/属性信息.

我深入研究了 tree/ctx 对象,并没有以易于使用的方式找到它们.

问题:如何在不构建丑陋黑客的情况下最好地解决这个问题?转译器似乎在 ANTLR 的用例点内,我是否遗漏了什么?

(相关部分)语法:

print : PRINTKW (ID | STRLIT) NEWLINE;STRLIT: '\'' .*?'\'' ;ID : [a-zA-Z0-9_]+;

访客覆盖:

//为案例1生成代码的示例代码(带引号)myVisitor.prototype.visitPrint = 函数(ctx){const Js =`console.log("${ctx.getChild(1).getText()}");`;//^^ 这是情况 1 和 2 需要不同处理的部分//写入文件fs.writeFile(targetFs + fileName + '.js', Js, 'utf8', function (err) {if (err) 返回 console.log(err);console.log(`完成`);});返回 this.visitChildren(ctx);};

使用 ANTLR 4.8

解决方案

您正在使用 getChild(1) 来访问打印语句的参数.这将为您提供一个 TerminalNode,其中包含一个 IDSTRLIT 令牌.您可以使用 getSymbol() 方法访问令牌,然后您可以使用 .type 属性访问令牌的类型.类型将是一个数字,您可以将其与诸如 MyLanguageParser.IDMyLanaguageParser.STRLIT 之类的常量进行比较.

使用 getChild 不一定是访问节点子节点的最佳方式.每个上下文类都有其每个子级的特定访问器.

特别是 PrintContext 对象将具有方法 ID()STRLIT().其中一个将返回 null,另一个将返回一个包含给定令牌的 TerminalNode 对象.因此,您可以通过查看哪个不为空来确定它是 ID 还是字符串文字.

也就是说,更常见的解决方案是在 print 规则中没有可能的参数类型的联合,而是允许任何类型的表达式作为 print<的参数/代码>.然后,您可以在 expression 规则中使用带标签的替代项来为每种表达式获取不同的访问者方法:

print : PRINTKW 表达式 NEWLINE;表达: STRLIT #StringLiteral|ID #变量;

那么您的访问者可能看起来像这样:

myVisitor.prototype.visitPrint = function(ctx) {const arg = this.visit(ctx.expression());const Js = `console.log(${arg});`;//写入文件fs.writeFile(targetFs + fileName + '.js', Js, 'utf8', function (err) {if (err) 返回 console.log(err);console.log(`完成`);});};myVisitor.prototype.visitStringLiteral = 函数(ctx){const text = ctx.getText();返回`"${text.substring(1, text.length - 1)}"`;}myVisitor.prototype.visitVariable = 函数(ctx){返回 ctx.getText();}

或者,您可以省略标签,而是定义一个 visitExpression 方法,通过查看哪个 getter 返回 null 来处理这两种情况:

myVisitor.prototype.visitExpression = function(ctx) {如果(ctx.STRLIT !== 空){const text = ctx.getText();返回`"${text.substring(1, text.length - 1)}"`;} 别的 {返回 ctx.getText();}}

PS:请注意,单引号在 JavaScript 中可以正常工作,因此您实际上不需要去除单引号并用双引号替换它们.在这两种情况下,您都可以只使用 .getText() 而不进行任何后处理,这仍然会作为有效的 JavaScript 出现.

I am writing transpiler (myLang -> JS) using ANTLR (javascript target with visitor).
Focus is on target code generation part, from the parse tree.
As in, how to deal with language source codes variations.

To make question clearer, consider two variations below -

source#1:
PRINT 'hello there'

source#2:

varGreeting = 'hey!'

PRINT varGreeting

In case 1, I deal with string. While in case 2, it's a variable. JS target code then needs to be different (below). case 1 with quotes, case 2 without.

target#1 (JS):

console.log("hello there");   // <-- string

target#2 (JS):

var varGreeting = "hey!";
console.log(varGreeting);  // <-- var

How can I best disambiguate and generate different code? At once, I thought of using rule name (ID, STRLIT) as bearer of different usages.
But I couldn't find these being exposed in RuleContext API. I looked at java ones, assuming same in JS runtime.

getText() gives value ('hello there', varGreeting), no meta/attribute info that I can leverage.

I digged into the tree/ctx object and didn't find them in easily consumable way.

Question: how to best go about this, without building ugly hacks? Transpiler seems to be in within use case spot of ANTLR, do I missing something?

(relevant part of) Grammar:

print : PRINTKW (ID | STRLIT) NEWLINE;

STRLIT: '\'' .*? '\'' ;
ID    : [a-zA-Z0-9_]+;

Visitor override:

// sample code for generating code for case 1 (with quotes) 
myVisitor.prototype.visitPrint = function(ctx) {


    const Js = 
    `console.log("${ctx.getChild(1).getText()}");`;

    // ^^ this is the part which needs different treatment for case 1 and 2 

    // write to file
    fs.writeFile(targetFs + fileName + '.js', Js, 'utf8', function (err) {
        if (err) return console.log(err);
        console.log(`done`);
      });

  return this.visitChildren(ctx);
};

using ANTLR 4.8

解决方案

You're using getChild(1) to access the argument of the print statement. This will give you a TerminalNode containing either an ID or STRLIT token. You can access the token using the getSymbol() method and you can then access the token's type using the .type property. The type will be a number that you can compare against constants like MyLanguageParser.ID or MyLanaguageParser.STRLIT.

Using getChild isn't necessarily the best way to access a node's children though. Each context class will have specific accessors for each of its children.

Specifically the PrintContext object will have methods ID() and STRLIT(). One of them will return null, the other will return a TerminalNode object containing the given token. So you know whether it was an ID or string literal by seeing which one isn't null.

That said, the more common solution would be to not have a union of possible kinds of arguments in the print rule, but instead allow any kind of expression as an argument to print. You can then use labelled alternatives in your expression rule to get different visitor methods for each kind of expression:

print : PRINTKW expression NEWLINE;

expression
    : STRLIT #StringLiteral
    | ID #Variable
    ;

Then your visitor could look like this:

myVisitor.prototype.visitPrint = function(ctx) {
    const arg = this.visit(ctx.expression());
    const Js = `console.log(${arg});`;

    // write to file
    fs.writeFile(targetFs + fileName + '.js', Js, 'utf8', function (err) {
        if (err) return console.log(err);
        console.log(`done`);
    });
};

myVisitor.prototype.visitStringLiteral = function(ctx) {
    const text = ctx.getText();
    return `"${text.substring(1, text.length - 1)}"`;
}

myVisitor.prototype.visitVariable = function(ctx) {
    return ctx.getText();
}

Alternatively you could leave out the labels and instead define a visitExpression method that handles both cases by seeing which getter returns null:

myVisitor.prototype.visitExpression = function(ctx) {
    if (ctx.STRLIT !== null) {
        const text = ctx.getText();
        return `"${text.substring(1, text.length - 1)}"`;
    } else {
        return ctx.getText();
    }
}

PS: Do note that single quotes work just fine in JavaScript, so you don't actually need to strip the single quotes and replace them with double quotes. You could just use .getText() without any post-processing in both cases and that'd still come out as valid JavaScript.

这篇关于代码生成的变化,使用 ANTLR4 解析树访问者的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆