部分C ++程序的静态分析 [英] Static analysis for partial C++ programs

查看:177
本文介绍了部分C ++程序的静态分析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在考虑通过C ++代码示例执行一些静态分析项目,而不是整个程序。一般来说,静态分析需要一些更简单的中间表示,但是这样的表示不能在没有整个程序代码的情况下准确创建。

I'm thinking about doing some static analysis project over C++ code samples, as opposed to entire programs. In general static analysis requires some simpler intermediate representation, but such a representation cannot be accurately created without the entire program code.

仍然,我知道有一个这样的工具 Java - 它基本上猜测缺少的信息,从而允许静态分析

Still, I know there is such a tool for Java - it basically "guesses" missing information and thus allows static analysis to take place even though it's no longer sound or complete.

是否有类似的东西可以用来将部分C ++代码转换成某种中间形式(例如LLVM字节码)?

Is there anything similar that can be used to convert partial C++ code into some intermediate form (e.g. LLVM bytecode)?

推荐答案

作为一般规则,如果你猜,你猜错了;基于这种猜测的静态分析器的任何投诉都是假阳性,并且将导致高的拒绝率。

As a general rule, if you guess, you guess wrong; any complaints from a static analyzer based on such guesses are false positives and will tend to cause a high rate of rejection.

如果你坚持猜测,你需要一个工具,可以解析任意C ++片段。 (猜测这种方法的静态分析....)。大多数C ++解析器将只解析完整的源文件,而不是片段。

If you insist on guessing, you'll need a tool that can parse arbitrary C++ fragments. ("Guess a static analysis of this method...."). Most C++ parsers will only parse complete source files, not fragments.

您还需要一种方法来构建部分符号表。 (我被列为FOO的一个参数,但是没有类型信息,并且它不像我在调用FOO之后的语句中声明的一样)。

You'll also need a way to build up partial symbol tables. ("I is listed as an argument to FOO, but has no type information, and it is not the same I as as is declared in the statement following the call to FOO").

我们的 DMS软件再造工具包及其 C ++前端可以提供片段的解析,并且可以用作部分符号表的跳板。

Our DMS Software Reengineering Toolkit with its C++ Front End can provide parsing of fragments, and might be used as a springboard for partial symbol tables.

DMS提供对代码的一般解析/分析/转换,由向DMS提供的显式langauge定义确定。 C ++前端提供了一个完整的,稳健的C ++前端,使DMS能够使用其中编码C ++查找规则的属性语法(AG)来解析C ++,构建AST并且为这样的AST构建符号表。 AG是在AST节点上编码的功能型计算;

DMS provides general parsing/analysis/transformation on code, as determined by an explicit langauge definition provided to DMS. The C++ Front End provides a full, robust C++ front end enabling DMS to parse C++, build ASTs, and build up symbol tables for such ASTs using an Attribute Grammar (AG) in which the C++ lookup rules are encoded. The AG is a functional-style computation encoded over AST nodes; the C++ symbol table builder is essence big functional program whose parts are attached to BNF grammar rules for C++.

作为通用解析机制的一部分,给定一个langauge定义(例如,C语言的语法规则)。作为C ++前端),DMS可以使用其内置的模式langauge解析该语言的任意(非)终端。所以DMS可以解析表达式,方法,声明等或任何其他形式良好的代码片段并构建AST。在提供非完好形式的片段的情况下,当前在片段解析上获得语法错误;它将有可能扩展DMS的错误恢复,以生成一个合理的AST解决方案,从而解析任意元素。

As part of the generic parsing machinery, given a langauge definition (such as the C++ front end), DMS can parse arbitrary (non)terminals of that language using its built-in pattern langauge. So DMS can parse expressions, methods, declarations, etc. or any other well-formed code fragment and build ASTs. Where a non-wellformed fragment is provided, one currently gets a syntax error on the fragment parse; it would be possible to extend DMS's error recovery to generate a plausabile AST fix and thus parse arbitrary elements.

部分符号表更难,因为许多符号表构建机械取决于正在建造的符号表的其他部分。然而,由于这都被编码为AG,所以可以运行与被解析的片段相关的AG的部分,例如用于方法的symobl表建立逻辑。 AG需要进行大量修改,以允许它以缺少符号定义的假设运行;这些将实际上变成约束。当然,缺少的符号可能是几个事情中的任何一个,并且可能结束可能的符号表的配置。考虑:

The partial symbol table is harder, since much of the symbol table building machinery depends on other parts of the symbol table being built. However, since this is all coded as an AG, one could run the part of the AG relevant to the fragment parsed, e.g., the symobl table building logic for a method. The AG would need to be modified probably extensively to allow it to operate with "assumptions" about missing symbol definitions; these would in effect become constraints. Of course, a missing symbol might be any of several things, and you might end up with configurations of possible symbol tables. Consider:

{ int X;
  T*X;
}

不知道T是什么,短语的类型类别)不能唯一确定。 (DMS将解析T * X;并报告歧义解析,因为存在多个可能的匹配解释,参见

Not knowing what T is, the type of the phrase (and even its syntactic category) can't be uniquely determined. (DMS will parse the T*X; and report an ambiguous parse since there are multiple possible matching interpretations, see Why C++ cannot be parsed with a LR(1) parser?)

我们已经做一些工作这个部分解析和部分符号表,其中我们使用DMS实验捕获包含预处理器条件的代码,一些条件状态未定义。这使得我们构建条件符号表条目。考虑:

We've already done some work this partial parsing and partial symbol tables, in which we used DMS experimentally to capture code containing preprocessor conditionals, with some conditional status undefined. This causes us to build conditional symbol table entries. Consider:

#if  foo
   int X;
#else
   void X(int a) {...}
#endif
...
#if foo
  X++;
#else
   X(7);
#endif

使用条件符号,此代码可以键入check。 X的符号表条目类似于X ==> int if foo else ==> void(int)。

With conditional symbols, this code can type check. The symbol table entry for X says something like, "X ==> int if foo else ==> void(int)".

我认为推理的想法具有约束的大程序片段是伟大的,但我怀疑它是真的很难,你永远试图解析有关约束的足够的信息,以进行静态分析。

I think the idea of reasoning about large program fragments with constraints is great, but I suspect it is really hard, and you'll forever being trying to resolve enough information about a constraint into order to do static analysis.

这篇关于部分C ++程序的静态分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆