何时使用抽象或具体的语法树? [英] When to use an abstract or concrete syntax tree?

查看:52
本文介绍了何时使用抽象或具体的语法树?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在研究编译器.词法分析器似乎非常直接:取一个句子"并将其分解为单词(或标记).为了确保正确的语法,需要一个解析器.解析器通常会获取标记并构建一个树,从而生成根节点(将单词转换为句子、段落、页面等......).

I've been doing research on compilers. The lexer seems to be very straight forward: Take a "sentence" and break it up into words (or tokens). To ensure correct grammar a parser is needed. The parser generally takes the tokens and builds a tree that results in a root node (words into sentences, paragraphs, pages, etc...).

来自这个问题 似乎解析器会构建一个 AST.AST 只包含执行代码所需的内容,因此不需要括号之类的东西,因为运算符优先级已内置到 AST 中.一个 AST 可能就是编译器所需要的全部.

From this question it would seem a parser would build an AST. The AST only contains what is necessary to execute the code, so things like parentheses would be unnecessary since operator precedence is built into an AST. An AST is probably all a compiler needs.

但是如何将代码从一种语言转换为另一种语言呢?将一种编造的语言(语法)或现有的语法转换成另一种,其中运算符优先级规则可能不同,也可能不同?运算符优先级是否也内置"到 CST 中?

But what about converting code from one language to another? Taking a made-up language (grammar) or an existing grammar and converting it into another where operator precedence rules may or may not be different? Is operator precedence "built in" to CST as well?

举个例子,假设我编写了一种语言并想将它翻译成 PHP 代码.大多数语言中的三元运算符具有从右到左的结合性.PHP 错误地使用了从左到右的关联(在这里查看更多信息).我希望我的语言"从右到左使用,但生成的 PHP 代码必须应用括号才能在 PHP 中获得正确的结果(使用 链接到维基百科,结果需要是train"而不是horse").

As an example lets say I made up a language and wanted to translate it into PHP code. The ternary operator on most languages has right-to-left associativity. PHP incorrectly uses left-to-right associativity (see more about this here). I want "my language" to use right-to-left but the resulting PHP code has to apply parenthesis to get the correct result in PHP (with the link to Wikipedia, result needs to be "train" instead of "horse").

那么对于语言翻译来说,CST 会更好吗?运算符优先级通常内置于 CST 中吗?中间有什么吗?是否有任何示例将两棵树与简单的代数方程进行比较?任何说明三元运算符的示例?

So for language translation would a CST be better? Is operator precedence usually built into a CST? Is there anything in-between? Are there any examples comparing both trees with a simple algebra equation? Any examples illustrating a ternary operator?

(转码"是编程语言翻译"的正确术语吗?Google 搜索会显示转换媒体.)

(Is "transcode" the correct term for "programming language translation"? A Google search brings up converting media.)

我想弄清楚的是:什么时候使用一个比另一个更合适?

What I'm trying to figure out is: When is it more appropriate to use one over the other?

推荐答案

对源语言的所有语义细节进行建模的 AST 就是您所需要的.根据定义,如果它确实对语义进行了正确建模,并且您的语言包含一个三元运算符,那么它也会正确建模应用运算符的特定顺序(例如,优先取模覆盖的结果,例如括号).

An AST that models all the semantic details of the source language is all you need. By definition, if it does model the semantics correctly, and your langauge includes a ternary operator, then it will model the specific order in which the operators are applied (e.g, the results of the predecence modulo overrides such as parentheses) correctly too.

所以你的问题不在于 AST.它使用优先级不同的类似(三元)运算符生成另一种语言.

So your problem isn't in the AST. It is generating to another language using similar (ternary) operators whose precedence is different.

这是代码生成中的一个古老问题:目标的运算符与源的运算符不完全匹配,因此输出不能是一对一的.在您的情况下,您应该能够通过生成带有括号的 PHP 三元运算符来控制顺序以实现原始语义来解决问题,因此这不是一个大问题.

This is an age-old problem in code generation: the operators of the target don't quite match the operators of the source, and so the output can't be one-to-one. In your case, you should be able to solve the problem by generating PHP ternary operators with parentheses around them to control the order to achieve what the original semantics are, so this isn't a big problem.

一般来说,生成实现预期结果的代码序列可能非常复杂,而且有很多方法可以做到.这就是编译器书籍厚而不薄的原因.您似乎已经隐含地选择了获取 AST,走 AST,吐出代码";这几乎是一个即时代码生成器.如果您不在乎生成的代码是否特别好,并且目标语言与源语言非常接近,那么这种方法就足够了.

In general, generating sequences of code that achieve a desired result can be pretty complicated, and there's lots of ways to do it. That's why the compiler books are thick rather than thin. You seem to have implicitly settled on "get AST, walk AST, spit code"; this is almost an on-the-fly code generator. And this works adequately if you don't care if the generated code is particularly good, and the target language is pretty close to the source language.

如果代码生成问题更复杂,通常会使用 AST 来生成计算的数据流模型,该模型由产生结果的运算符组成,并消耗先前运算符的结果,以获取变量值和常量的运算符".然后遍历数据流表示生成代码;这样做的好处是您可以在数据流表示中选择一个运算符,在目标语言中找到匹配的代码序列,然后生成它,然后再考虑如何收集操作数.更好的方案将数据流子图(表示等效的复合目标语言结构)与生成的数据流图相匹配;这可以产生明显更好的代码.通常,可以在原始代码生成后应用特定于目标语言的优化来生成更好的代码.在这两种情况下,您都必须担心管理操作员结果;它们可以直接提供给下一个目标语言运算符,还是必须进入某种临时存储(对于机器代码,这可以是另一个寄存器或内存位置).做到这一切并不容易.同样,这就是编译器书籍不薄的原因.

If the code generation problem is more complex, what normally happens is the AST is used to generate what amounts to a data flow model of the computation, consisting of operators that produce results, and consume results from previous operators, grounding out in "operators" that fetch variable values and constants. Then the data flow representation is traversed to generate the code; this has the advantage that you can choose an operator in the data flow representation, find a matching code sequence in the target language, generate that, and then worry about how the operands are collected. Better schemes match data flow subgraphs (representing equivalent compound target language constructs) to the produced data flow graph; this can produce significantly better code. Often, one can apply target-language specific optimizations after raw code generation to produce even better code. In both cases, you have to worry about managing the operator results; can they be fed directly to the next target language operator, or must they go into some kind of temporary storage (for machine code, this can be another register or a memory location). Doing all this isn't easy; again, that's why the compiler books aren't thin.

这个想法的一个变体是源到源程序转换.这将源代码中的构造直接"映射到目标代码中的构造,尽管这通常是通过操作 AST 在幕后完成的,因为未解析的编程语言文本难以匹配.我们的DMS 软件再造工具包就是这种系统的一个例子.使用这样的工具,您可以在源语言中编写模式(隐式匹配解析树),并在目标语言中编写相应的模式(隐式生成目标语言 AST).您可以编写复杂的源或目标构造,以提供上述数据流图匹配的大部分效果.生成后优化包含更多重写规则,将目标代码转换为目标代码.

A variation on this idea is source-to-source program transformations. This maps constructs in source code "directly" to constructs in target code, although this is usually done behind the scenes by operating on ASTs because unparsed programming language text is hard to match. Our DMS Software Reengineering Toolkit is an example of this kind of system. With such a tool, you write patterns in the source language (which implicitly match against a parse tree), and corresponding patterns in the target langauge (implicitly producing target language ASTs). You can write complex source or target constructs giving much of the effect of the data flow graph matching above. Post-generation optimization consists of more rewrite rules that transform the target code to target code.

底线:除非您的翻译非常简单,否则拥有 AST 是不够的.您可以在这个 SO 答案中阅读更多关于您需要什么的信息:https://stackoverflow.com/a/3460977/120163

Bottom line: Having an AST isn't really enough unless your translation is truly trivial. You can read more about what you do need at this SO answer: https://stackoverflow.com/a/3460977/120163

警告:强烈的意见如下.

WARNING: Loud opinion follows.

关于转码器":我更喜欢术语编译"、翻译"或源到源"编译器.近 40 年来,我一直在构建程序分析和操作工具.在遇到这个 SO 问题之前,我从来没有听说过转码器"这个词:将遗留 Cobol/PL1 迁移到 Java 的经验和一个描述恕我直言一个真正糟糕的代码翻译方案的回应,称为 NACA.我听说这个词越来越受欢迎;我不明白为什么当我们有完全足够的术语时我们必须发明另一个术语.通常这是某人发明了大祭司的标志;让我们发明一个闪亮的新术语,这样人们就不会真正理解我们在做什么".我很高兴把这个词留给如此糟糕的翻译.

Re "Transcoder": I much prefer the term "compilation", "translation", or "source-to-source" compiler. I've been building program analysis and manipulation tools for almost 40 years. I'd never heard the term "transcoder" until I encountered this SO question: Experience migrating legacy Cobol/PL1 to Java and a response describing IMHO a truly awful code translation scheme called NACA. I've heard since that this term is gaining traction; I don't see why we had to invent another term when we have perfectly adequate ones. Usually this is a sign of somebody inventing a high priesthood; "let's invent a shiny new term so people don't really understand what we're doing". I'm happy to leave that term to such truly awful translations.

这篇关于何时使用抽象或具体的语法树?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆