什么时候使用抽象或具体的语法树? [英] When to use an abstract or concrete syntax tree?

查看:118
本文介绍了什么时候使用抽象或具体的语法树?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在研究编译器。这个词法分析器似乎很简单:采用句子并将其分解为单词(或标记)。为了确保正确的语法,需要一个解析器。解析器通常会获取令牌并构建一个树,该树会生成一个根节点(单词变成句子,段落,页面等)。

I've been doing research on compilers. The lexer seems to be very straight forward: Take a "sentence" and break it up into words (or tokens). To ensure correct grammar a parser is needed. The parser generally takes the tokens and builds a tree that results in a root node (words into sentences, paragraphs, pages, etc...).

来自这个问题它似乎解析器会建立一个AST。 AST仅包含执行代码所必需的内容,因此不需要括号之类的东西,因为AST中内置了运算符优先级。 AST可能是编译器所需的全部。

From this question it would seem a parser would build an AST. The AST only contains what is necessary to execute the code, so things like parentheses would be unnecessary since operator precedence is built into an AST. An AST is probably all a compiler needs.

但是如何将代码从一种语言转换为另一种语言呢?采用一种组合语言(语法)或现有语法并将其转换为另一种运算符优先级规则可能不同的地方?

But what about converting code from one language to another? Taking a made-up language (grammar) or an existing grammar and converting it into another where operator precedence rules may or may not be different? Is operator precedence "built in" to CST as well?

例如,说我组成了一种语言,想将其翻译成PHP代码。

As an example lets say I made up a language and wanted to translate it into PHP code. The ternary operator on most languages has right-to-left associativity. PHP incorrectly uses left-to-right associativity (see more about this here). I want "my language" to use right-to-left but the resulting PHP code has to apply parenthesis to get the correct result in PHP (with the link to Wikipedia, result needs to be "train" instead of "horse").

大多数语言上的三元运算符具有从右到左的关联性。 PHP错误地使用了从左到右的关联性(在此处了解更多信息)。我希望我的语言从右到左使用,但是生成的PHP代码必须应用括号才能在PHP中获得正确的结果(使用链接至Wikipedia ,结果必须是火车而不是马。)

So for language translation would a CST be better? Is operator precedence usually built into a CST? Is there anything in-between? Are there any examples comparing both trees with a simple algebra equation? Any examples illustrating a ternary operator?

因此语言翻译CST会更好吗?通常在CST中内置运算符优先级吗?之间有什么吗?有没有将两个树与一个简单的代数方程进行比较的示例?有任何说明三元运算符的示例吗?

(Is "transcode" the correct term for "programming language translation"? A Google search brings up converting media.)

(代码转换是否是编程语言翻译的正确术语?Google搜索带来了转换媒体。)

What I'm trying to figure out is: When is it more appropriate to use one over the other?

我要弄清楚的是:什么时候更合适地使用一个?

解决方案

推荐答案

仅需要一个AST,它可以对源语言的所有语义细节进行建模。根据定义,如果它确实正确地对语义建模,并且您的语言中包含三元运算符,那么它也将正确地对运算符的应用顺序进行建模(例如,优先级模替代的结果,例如括号)。 / p>

所以您的问题不在AST中。它正在使用优先级不同的相似(三元)运算符生成另一种语言。

So your problem isn't in the AST. It is generating to another language using similar (ternary) operators whose precedence is different.

这是代码生成中的一个古老问题:目标运算符不会不能完全匹配源运算符,因此输出不能是一对一的。在您的情况下,您应该能够通过生成带有括号的PHP三元运算符来控制顺序,以实现原始语义,从而解决该问题,因此,这不是一个大问题。

This is an age-old problem in code generation: the operators of the target don't quite match the operators of the source, and so the output can't be one-to-one. In your case, you should be able to solve the problem by generating PHP ternary operators with parentheses around them to control the order to achieve what the original semantics are, so this isn't a big problem.

通常,生成实现所需结果的代码序列可能非常复杂,并且有很多方法可以实现。这就是为什么编译器书籍要厚而不是薄。您似乎已经隐式地选择了获取AST,浏览AST,输入代码;这几乎是即时代码生成器。如果您不关心所生成的代码是否特别好,并且目标语言与源语言非常接近,则此方法就足够了。

In general, generating sequences of code that achieve a desired result can be pretty complicated, and there's lots of ways to do it. That's why the compiler books are thick rather than thin. You seem to have implicitly settled on "get AST, walk AST, spit code"; this is almost an on-the-fly code generator. And this works adequately if you don't care if the generated code is particularly good, and the target language is pretty close to the source language.

如果代码生成有问题更复杂的是,通常会发生什么事,AST是用来生成相当于计算的数据流模型的,由产生结果的运算符和消耗先前运算符的结果组成,扎根于获取变量值和常数。然后遍历数据流表示以生成代码。这样做的好处是,您可以在数据流表示中选择一个运算符,以目标语言查找匹配的代码序列,生成该序列,然后担心如何收集操作数。更好的方案将数据流子图(表示等效的复合目标语言构造)匹配到生成的数据流图;这样可以产生更好的代码。通常,可以在生成原始代码后应用特定于目标语言的优化来生成更好的代码。在这两种情况下,您都不必担心管理操作员结果。可以将它们直接提供给下一个目标语言运算符,还是必须将它们放入某种临时存储中(对于机器代码,这可以是另一个寄存器或存储位置)。要做到这一切并不容易;再次,这就是为什么编译器书籍并不薄的原因。

If the code generation problem is more complex, what normally happens is the AST is used to generate what amounts to a data flow model of the computation, consisting of operators that produce results, and consume results from previous operators, grounding out in "operators" that fetch variable values and constants. Then the data flow representation is traversed to generate the code; this has the advantage that you can choose an operator in the data flow representation, find a matching code sequence in the target language, generate that, and then worry about how the operands are collected. Better schemes match data flow subgraphs (representing equivalent compound target language constructs) to the produced data flow graph; this can produce significantly better code. Often, one can apply target-language specific optimizations after raw code generation to produce even better code. In both cases, you have to worry about managing the operator results; can they be fed directly to the next target language operator, or must they go into some kind of temporary storage (for machine code, this can be another register or a memory location). Doing all this isn't easy; again, that's why the compiler books aren't thin.

这种想法的一种变体是源代码到源代码的程序转换。尽管通常通过在AST上进行操作来完成此操作,因为未解析的编程语言文本很难匹配,所以此操作将源代码中的结构直接映射到目标代码中的结构。我们的 DMS软件再造工具包是这种系统的示例。使用这种工具,您可以用源语言(与分析树隐式匹配)编写模式,并在目标语言中隐式地编写相应的模式(隐式生成目标语言AST)。您可以编写复杂的源或目标构造,从而获得上述数据流图匹配的大部分效果。后代优化由更多的重写规则组成,这些规则将目标代码转换为目标代码。

A variation on this idea is source-to-source program transformations. This maps constructs in source code "directly" to constructs in target code, although this is usually done behind the scenes by operating on ASTs because unparsed programming language text is hard to match. Our DMS Software Reengineering Toolkit is an example of this kind of system. With such a tool, you write patterns in the source language (which implicitly match against a parse tree), and corresponding patterns in the target langauge (implicitly producing target language ASTs). You can write complex source or target constructs giving much of the effect of the data flow graph matching above. Post-generation optimization consists of more rewrite rules that transform the target code to target code.

底线:仅拥有AST还是不够的,除非您的翻译非常琐碎。
您可以在以下SO答案中了解有关您需要做什么的更多信息: https://stackoverflow.com/a/3460977/ 120163

Bottom line: Having an AST isn't really enough unless your translation is truly trivial. You can read more about what you do need at this SO answer: https://stackoverflow.com/a/3460977/120163

警告:大声疾呼。

关于转码器:我更喜欢术语编译,翻译或源到源编译器。我从事程序分析和操作工具的开发已有近40年了。直到遇到这样的问题,我才听过代码转换器一词:将旧版Cobol / PL1迁移到Java的经验描述IMHO的回应是一种称为NACA的真正糟糕的代码转换方案。从那以后,我听说这个词越来越受到关注;我不知道为什么在我们有足够的术语时为什么还要发明另一个术语。通常,这表明有人发明了圣职。 让我们发明一个闪亮的新名词,这样人们就不会真正了解我们在做什么。很高兴将这个词留给如此糟糕的翻译。

Re "Transcoder": I much prefer the term "compilation", "translation", or "source-to-source" compiler. I've been building program analysis and manipulation tools for almost 40 years. I'd never heard the term "transcoder" until I encountered this SO question: Experience migrating legacy Cobol/PL1 to Java and a response describing IMHO a truly awful code translation scheme called NACA. I've heard since that this term is gaining traction; I don't see why we had to invent another term when we have perfectly adequate ones. Usually this is a sign of somebody inventing a high priesthood; "let's invent a shiny new term so people don't really understand what we're doing". I'm happy to leave that term to such truly awful translations.

这篇关于什么时候使用抽象或具体的语法树?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆