我可以对代码实施哪些类型的模式以使其更容易转换为另一种编程语言? [英] What kinds of patterns could I enforce on the code to make it easier to translate to another programming language?

查看:21
本文介绍了我可以对代码实施哪些类型的模式以使其更容易转换为另一种编程语言?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在着手做一个副项目,其目标是将代码从一种编程语言翻译成另一种编程语言.我开始使用的语言是 PHP 和 Python(Python 到 PHP 应该更容易开始),但理想情况下,我能够(相对)轻松地添加其他语言.计划是:

  • 这是面向 Web 开发的.原始代码和目标代码将位于框架之上(我也必须编写).这些框架将采用 MVC 设计模式并遵循严格的编码约定.这应该会使翻译更容易一些.

  • 我也在关注 IOC 和依赖注入,因为它们可能使翻译过程更容易且不易出错.

  • 我将使用 Python 的 解析器模块,它让我摆弄抽象语法树.显然,我最接近 PHP 的是 token_get_all(),这是一个开始.

  • 从那时起,我可以构建 AST、符号表和控制流.

那我相信我可以开始输出代码了.我不需要完美的翻译.我仍然需要查看生成的代码并修复问题.理想情况下,译员应标记有问题的翻译.

在你问这到底有什么意义?"之前答案是……这将是一次有趣的学习经历.如果您有任何关于如何使这不那么令人生畏的见解,请告诉我.


与如何进行翻译相比,我更想知道我可以对代码实施哪些模式以使其更容易翻译(即:IoC、SOA?).

解决方案

我已经自 1995 年以来,在强大的计算机科学家团队的支持下,构建工具(DMS 软件再造工具包) 来进行通用程序操作(语言翻译是一个特例).DMS 提供通用解析、AST 构建、符号表、控制和数据流分析、翻译规则的应用、带有注释的源文本的再生等,所有这些都通过计算机语言的显式定义进行参数化.

你需要的机器数量很好是巨大的(特别是如果你希望能够以通用的方式为多种语言做到这一点),然后你需要可靠的语言解析器定义不可靠(PHP 就是一个很好的例子).

您考虑构建语言到语言的翻译器或尝试它并没有错,但我认为对于真正的语言,您会发现这是一项比您预期的更大的任务.我们仅在 DMS 上就投入了大约 100 人年,在每个可靠"语言定义(包括我们为 PHP 痛苦构建的定义)上又投入了 6-12 个月,对于 C++ 等讨厌的语言则投入了更多.这将是一次地狱般的学习经历";对我们来说是这样的.(您可能会发现上述网站上的技术论文部分很有趣,可以快速开始学习).

人们经常试图通过从他们熟悉的一些技术开始来构建某种通用机器,这完成了工作的一部分.(Python AST 就是很好的例子).好消息是,部分工作已经完成.坏消息是,机器内置了无数假设,其中大部分假设在您尝试将其用于做其他事情之前都不会发现.在那一点上,您会发现机器已经连接到可以做它最初做的事情,并且真的会抵制您让它做其他事情的尝试.(我怀疑尝试让 Python AST 为 PHP 建模会很有趣).

我最初开始构建 DMS 的原因是为了构建几乎没有内置此类假设的基础.它有一些让我们头疼.到目前为止,还没有黑洞.(在过去的 15 年里,我工作中最困难的部分是努力防止这种假设蔓延).

许多人还错误地认为,如果他们可以解析(并且可能获得 AST),他们就可以做一些复杂的事情.艰难的教训之一是您需要符号表和流分析来进行良好的程序分析或转换.AST 是必要的,但还不够.这就是 Aho&Ullman 的编译器书没有停在第 2 章的原因.(OP 有这个权利,因为他计划在 AST 之外构建额外的机器).有关此主题的更多信息,请参阅解析后的生活.

我不需要完美的翻译"这句话很麻烦.弱翻译人员所做的是转换简单"的 80% 的代码,而将难的 20% 留给手工完成.如果您打算转换的应用程序非常小,并且您只想很好地转换一次,那么 20% 就可以了.如果您想转换许多应用程序(或者甚至是随着时间的推移稍有变化的同一个应用程序),这并不好.如果您尝试转换 100K SLOC,那么 20% 是 20,000 行原始代码,这些代码在您已经不理解的另外 80,000 行已翻译程序的上下文中难以翻译、理解和修改.这需要付出巨大的努力.在百万线级别,这在实践中是根本不可能的.(令人惊讶的是,有些人不信任自动化工具并坚持手动翻译百万行系统;这甚至更难,他们通常会因长时间延迟、高成本和经常彻底失败而痛苦地发现.)>

要翻译大型系统,您必须争取达到 90% 的高转化率,或者您可能无法完成翻译活动的手动部分.

另一个关键考虑因素是要翻译的代码的大小.即使使用良好的工具,构建一个有效的、健壮的翻译器也需要大量的精力.虽然构建翻译器而不是简单地进行手动转换看起来很酷很酷,但对于小型代码库(例如,根据我们的经验,高达大约 100K SLOC),经济学根本无法证明它的合理性.没有人喜欢这个答案,但是如果您真的只需要翻译 10K SLOC 的代码,那么您最好还是咬紧牙关去做.是的,这很痛苦.

我认为我们的工具非常好(但是,我非常有偏见).而且要打造一个好的翻译器还是很困难的;我们需要大约 1.5 到 2 个人年的时间,而且我们知道如何使用我们的工具.不同之处在于,有了这么多机器,我们成功的次数比失败的次数要多得多.

I am setting out to do a side project that has the goal of translating code from one programming language to another. The languages I am starting with are PHP and Python (Python to PHP should be easier to start with), but ideally I would be able to add other languages with (relative) ease. The plan is:

  • This is geared towards web development. The original and target code will be be sitting on top of frameworks (which I will also have to write). These frameworks will embrace an MVC design pattern and follow strict coding conventions. This should make translation somewhat easier.

  • I am also looking at IOC and dependency injection, as they might make the translation process easier and less error prone.

  • I'll make use of Python's parser module, which lets me fiddle with the Abstract Syntax Tree. Apparently the closest I can get with PHP is token_get_all(), which is a start.

  • From then on I can build the AST, symbol tables and control flow.

Then I believe I can start outputting code. I don't need a perfect translation. I'll still have to review the generated code and fix problems. Ideally the translator should flag problematic translations.

Before you ask "What the hell is the point of this?" The answer is... It'll be an interesting learning experience. If you have any insights on how to make this less daunting, please let me know.


EDIT:

I am more interested in knowing what kinds of patterns I could enforce on the code to make it easier to translate (ie: IoC, SOA ?) the code than how to do the translation.

解决方案

I've been building tools (DMS Software Reengineering Toolkit) to do general purpose program manipulation (with language translation being a special case) since 1995, supported by a strong team of computer scientists. DMS provides generic parsing, AST building, symbol tables, control and data flow analysis, application of translation rules, regeneration of source text with comments, etc., all parameterized by explicit definitions of computer languages.

The amount of machinery you need to do this well is vast (especially if you want to be able to do this for multiple languages in a general way), and then you need reliable parsers for languages with unreliable definitions (PHP is perfect example of this).

There's nothing wrong with you thinking about building a language-to-language translator or attempting it, but I think you'll find this a much bigger task for real languages than you expect. We have some 100 man-years invested in just DMS, and another 6-12 months in each "reliable" language definition (including the one we painfully built for PHP), much more for nasty languages such as C++. It will be a "hell of a learning experience"; it has been for us. (You might find the technical Papers section at the above website interesting to jump start that learning).

People often attempt to build some kind of generalized machinery by starting with some piece of technology with which they are familiar, that does a part of the job. (Python ASTs are great example). The good news, is that part of the job is done. The bad news is that machinery has a zillion assumptions built into it, most of which you won't discover until you try to wrestle it into doing something else. At that point you find out the machinery is wired to do what it originally does, and will really, really resist your attempt to make it do something else. (I suspect trying to get the Python AST to model PHP is going to be a lot of fun).

The reason I started to build DMS originally was to build foundations that had very few such assumptions built in. It has some that give us headaches. So far, no black holes. (The hardest part of my job over the last 15 years is to try to prevent such assumptions from creeping in).

Lots of folks also make the mistake of assuming that if they can parse (and perhaps get an AST), they are well on the way to doing something complicated. One of the hard lessons is that you need symbol tables and flow analysis to do good program analysis or transformation. ASTs are necessary but not sufficient. This is the reason that Aho&Ullman's compiler book doesn't stop at chapter 2. (The OP has this right in that he is planning to build additional machinery beyond the AST). For more on this topic, see Life After Parsing.

The remark about "I don't need a perfect translation" is troublesome. What weak translators do is convert the "easy" 80% of the code, leaving the hard 20% to do by hand. If the application you intend to convert are pretty small, and you only intend to convert it once well, then that 20% is OK. If you want to convert many applications (or even the same one with minor changes over time), this is not nice. If you attempt to convert 100K SLOC then 20% is 20,000 original lines of code that are hard to translate, understand and modify in the context of another 80,000 lines of translated program you already don't understand. That takes a huge amount of effort. At the million line level, this is simply impossible in practice. (Amazingly there are people that distrust automated tools and insist on translating million line systems by hand; that's even harder and they normally find out painfully with long time delays, high costs and often outright failure.)

What you have to shoot for to translate large-scale systems is high nineties percentage conversion rates, or it is likely that you can't complete the manual part of the translation activity.

Another key consideration is size of code to be translated. It takes a lot of energy to build a working, robust translator, even with good tools. While it seems sexy and cool to build a translator instead of simply doing a manual conversion, for small code bases (e.g., up to about 100K SLOC in our experience) the economics simply don't justify it. Nobody likes this answer, but if you really have to translate just 10K SLOC of code, you are probably better off just biting the bullet and doing it. And yes, that's painful.

I consider our tools to be extremely good (but then, I'm pretty biased). And it is still very hard to build a good translator; it takes us about 1.5-2 man-years and we know how to use our tools. The difference is that with this much machinery, we succeed considerably more often than we fail.

这篇关于我可以对代码实施哪些类型的模式以使其更容易转换为另一种编程语言?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆