如何从其源代码为任何应用程序创建数据流图(DFG / SDFG) [英] How can one create a data flow graph (DFG/SDFG) for any application from its source code

查看:631
本文介绍了如何从其源代码为任何应用程序创建数据流图(DFG / SDFG)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经做了大量的研究,以了解如何从源代码为应用程序创建DFG。有一些应用程序可以在线使用DFG,如MP3解码器,JPEG压缩和H.263解码器。



我无法弄清楚如何从其源代码为诸如HEVC的应用程序创建DFG?是否有任何工具可以立即生成这些复杂应用程序的数据流图,还是必须手动完成?



请告知我这件事。



编辑:
我使用Doxygen进行HEVC和I可以看到不同的功能是如何相互作用的。然而,每个功能都有很多入口点和出口点,并且Doxygen的输出在一段时间后变得太混乱了。



我也看了StreamIt: http://camlunity.ru/swap/Library/Conflux/Stream%20Programming/streamit-cc_stream_graph_programming_language.pdf



它似乎很方便,但它为更简单的应用程序(如MP3解码器)生成的图形太复杂了。
为了生成一个一致的DFG,我必须重写整个源代码吗?

解决方案

你想从任意语言提取数据流图。你意味着你想要一个单一的方法来做到这一点。这是不实际的手... 你需要一个工具。



这样一个工具是非常难以构建的。

为了做到这一点,对于每种语言你必须能够:




  • 将语言定义为该工具以您在实践中发现的形式(不仅仅是语言参考手册版本)。 C ++在野外与标准相比有很多有趣的方式。

  • 解析语言中发现的程序,可能是一个文件,也许是数万个;一些程序不小。

  • 构建表示语言元素及其关系的结构(这通常作为抽象语法树完成)

  • 确定每个文字的实际价值是什么; a\xbc具有非常不同的值,具体取决于语言是否具有转义序列的ascii或unicode文本

  • 查找代码中的所有标识符,并确定每个定义/类型信息根据语言范围规则与其相关联

  • 确定数据源(文字值,来自外界的输入,表达结果),并跟踪这些数据值的位置在程序的其他部分用于各种控制流程构造

  • 大概可以画出一些结果数据流。



这些任务本身都很难,因为语言往往是复杂的。大多数语言工具(主要是编译器)都可以做到这一点,只能用于一种语言的方言。



要为多种语言/方言执行此操作,您需要一种可以为每种语言的所有详细信息配置的工具,并且必须为所有感兴趣的语言[实际上你不能全部做现在有数千种计算机语言]。



甚至将自己限制在日常通用编程语言中,这是一个非常大的数量工作的;对于单一主流语言,可能需要几年的时间才能做到这一点。你不会自己做这个。



我的公司建立一个单一的统一工具,旨在有能力做到这一点:,http://www.semanticdesigns.com/Products/DMS/DMSToolkit.html?site=StackOverflowrel =nofollow noreferrer简单的秘密是意识到完成上述所需的机器任务实际上在各种语言之间非常相似,并且可以被设计为针对具有相对适度(特别是意味着小)努力的特定语言进行配置。



在具有博士学位的工程师队伍20年的工程线上,我们有解析器(即使这很难),一个令人惊讶的各种语言,充满了数据流分析器您正在谈论的类型 C ++(检查此链接的示例),C,COBOL和几乎Java 8。



我不知道任何其他统一的工具,这远远落后于您的理想。在我决定我对此无知之前检查我的生物。 ( Rascal / MPL 有一些野心,但现在是一个研究工具,他们不做C或C ++)我们只是在那里的一部分,有很多语言和大规模的战斗来抵抗剩余。



[DMS的目标不是数据流分析;这只是一个踏脚石。它是做自动化代码转换,这需要数据流分析来安全和正确地执行]。



当然,你可以希望为每种语言找到一个单独的工具。


您将不能从不同作者的不同工具获得一致的质量或一致的样式/粒度的数据流图。

I have done a lot of research to figure out how a DFG can be created for an application from its source code. There are DFG's available online for certain applications such as MP3 Decoder, JPEG compression and H.263 Decoder.

I haven't been able to figure out how I can create a DFG for an application such as HEVC from its source code? Are there any tools which can instantly generate data flow graphs for such elaborate applications or does it have to be done manually?

Please advise me regarding this matter.

EDIT: I used Doxygen for HEVC and I could see how different functions were interacting with each other. However, every function had many entry and exit points and output of Doxygen became too confusing to follow after a while.

I also looked at StreamIt: http://camlunity.ru/swap/Library/Conflux/Stream%20Programming/streamit-cc_stream_graph_programming_language.pdf

It seemed handy but the graphs it generated for even simpler applications (like MP3 Decoder) were too complex. In order to generate a coherent DFG, will I have to re-write the entire source code?

解决方案

You want to extract data flow graphs from arbitrary languages. You imply you want a single way to do it. This isn't practical by hand... you need a tool.

Such a tool is singularly hard to build.

To do this, for each language you must be able to:

  • Define the language to the tool, in the form you find it in practice (not just the language reference manual version). C++ in the wild is bent in a lot of funny ways compared to the standard.
  • Parse programs in the language as found in the field, perhaps as one file, perhaps as tens of thousands; some programs aren't small.
  • Build structures representing the language elements and their relations to one another (this is often done as an abstract syntax tree)
  • Determine for each literal what its actual value is; "a\xbc" has very different values depending on whether the language thinks it is ascii or unicode text with escape sequences
  • Find all the identifiers in the code, and determine for each one which definitional/type information is associated with it according the language scoping rules
  • Determine the sources of data (literal values, inputs from the outside world, results of expressions) and track where those data values are used in other parts of the program across various control flow constructs
  • Presumably draw some picture of the resulting data flow.

Each of these tasks by themselves are hard, because the languages tend to be complex. Most language tools that can do this at all (mostly compilers) do it only for one dialect of the language.

To do it for more than one language/dialect, you need a tool that can be configured for all the details for each language, and you have to configure for all the languages of interest. [Realistically you cannot "do them all"; there's thousands of computer languages in use right now].

Even limiting yourself to the "everyday" common programming languages, this is an enormous amount of work; it can take a few years to do all this well for a single mainstream language. You won't succeed in doing this by yourself.

My company builds a single, unified tool that is intended to be capable of doing this: the DMS Software Reengineering Toolkit. The simple "secret" is to realize that the machinery needed to accomplish the above tasks is actually very similar across languages, and can be designed to configured for a particular language with relatively modest (that doesn't mean "small") effort.

After 20 linear years of engineering with a team of PhD level engineers, we have parsers (even this is hard) for a surprising variety of languages, with full up data flow analyzers of the type you are talking about for C++ (check this link for examples), C, COBOL and almost Java 8.

I don't know of any other unified tools that are this far down the path toward your ideal. Check my bio before you decide I'm clueless about this. (Rascal/MPL has some ambitions but is a research tool at this point; they don't do C or C++ at all) We're only part way there, with many languages and scale battles to fight remaining.

[The goal of DMS isn't data flow analysis; that's just a stepping stone. It is to do automated code transformation, which requires data flow analysis to do safely and correctly].

Of course, you could just hope to find a separate tool for each language. You wouldn't get consistent quality or consistent style/granularity of data flow graphs from separate tools from different authors, if you can indeed get a full set of such tools at all.

这篇关于如何从其源代码为任何应用程序创建数据流图(DFG / SDFG)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆