从 C++ 文件中删除无用的行 [英] Removing useless lines from c++ file

查看:45
本文介绍了从 C++ 文件中删除无用的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我调试或重用某些代码时,很多时候文件开始获取不执行任何操作的行,尽管它们可能在某一时刻执行了某些操作.

There are many times when as I am debugging, or reusing some code, the file starts to acquire lines that don't do anything, though they may have done something at one point.

诸如向量和填充之类的东西,然后不再使用,已定义但从未使用过的类/结构,以及已声明但从未使用过的函数.

Things like vectors and getting filled, and then go unused, classes/structs that are defined but never used, and functions that are declared, but never used.

我知道在很多情况下,其中一些东西并不是多余的,因为它们可能从其他文件中可见,但就我而言,没有其他文件,我的文件中只有无关代码.

I understand that in many cases, some of these things are not superfluous, as they might be visible from other files, but in my case, there are no other files, just extraneous code in my file.

虽然我明白从技术上讲,调用 push_back 会做一些事情,因此向量本身并没有被使用,在我的例子中,它的结果没有被使用.

While I understand that technically speaking, invoking push_back does something, and therefore the vector is not unused per se, in my case, its result goes unused.

那么:有没有办法做到这一点,要么使用编译器(clang、gcc、VS 等),要么使用外部工具?

So: Is there a way to do this, either using a compiler (clang, gcc, VS, etc) or an external tool?

示例:

#include<vector>
using namespace std;
void test() {
    vector<int> a;
    a.push_back(1);
}
int main() {
    test();
    return 0;
}

应该变成:int main(){return 0};

推荐答案

我们的 DMS Software Reengineering Toolkit 及其 C++11 前端可用于执行此操作;它目前没有现成的.DMS 旨在为任意源语言提供自定义工具构建,并包含完整的解析器、名称解析器和各种流分析器来支持分析,以及根据分析结果对代码应用源到源转换的能力.

Our DMS Software Reengineering Toolkit with its C++11 front end could be used to do this; it presently does not do this off the shelf. DMS is designed to provide custom tool construction for arbitrary source languages, and contains full parsers, name resolvers, and various flow analyzers to support analysis, as well as the ability to apply source-to-source transformations on the code based on analysis results.

通常,您需要一个静态分析来确定是否使用每个计算(结果,可能有多个,仅考虑x++").对于每个未使用的计算,实际上您希望删除未使用的计算,并重复分析.出于效率原因,您希望进行一次分析,以确定结果的所有(点)使用情况;这本质上是一种数据流分析.当一个计算结果的使用集为空时,可以删除该计算结果(注意,删除x++"值结果可能会留下x++",因为仍然需要增量!)以及它所依赖的计算使用集可以调整以从已删除的引用中删除引用,可能会导致更多的删除.

In general, you want a static analysis that determines whether every computation (result, there may be several, consider just "x++") is used or not. For each unused computation, in effect you want to remove the unused computation, and repeat the analysis. For efficiency reasons, you want to do an analysis that determines all the (points of) usage of the result(s) just once; this is essentially a data flow analysis. When the usage set of a computation result goes empty, that computation result can be deleted (note that deleting "x++" value result may leave behind "x++" because the increment is still needed!) and the usage sets of computations on which it depends can be adjusted to remove references from the deleted one, possibly causing more removals.

要对任何语言进行这种分析,您必须能够跟踪结果.对于 C(和 C++),这可能非常难看;有明显"的用途,其中计算结果用于表达式中,并将其分配给局部/全局变量(在其他地方使用),并且通过指针,对象字段更新,通过任意强制转换进行间接分配等.要了解这些影响,您的死代码分析工具必须能够读取整个软件系统,并计算其中的数据流.

To do this analysis for any language, you have to be able to trace results. For C (and C++) this can be pretty ugly; there are "obvious" uses where a computation result is used in a expression, and where it is assigned to a local/global variable (which is used somewhere else), and there are indirect assignments through pointers, object field updates, through arbitrary casts, etc. To know these effects, your dead code analysis tool has to be able to read the entire software system, and compute dataflows across it.

为了安全起见,您希望该分析是保守的,例如,如果该工具没有证明未使用结果的证据,则它必须假设该结果已被使用;您通常必须使用指针(或只是伪装的指针的数组索引)来执行此操作,因为通常您无法准确确定指针指向"的位置.通过假设所有结果都被使用,显然可以构建一个安全"工具:-} 对于没有源的库例程,您有时也会得到非常保守但必要的假设.在这种情况下,有一组库副作用的预先计算摘要是有帮助的(例如,strcmp"没有,sprintf"覆盖特定操作数,push_back"修改其对象......).由于库可能很大,因此此列表可能很大.

To be safe, you want that analysis to be conservative, e.g., if the tool does not have proof that a result is not used, then it must assume the result is used; you often have to do this with pointers (or array indexes which are just pointers in disguise) because in general you can't determine precisely where a pointer "points". One can obviously build a "safe" tool by assuming all results are used :-} You will also end up with sometimes very conservative but necessary assumptions for library routines for which you don't have the source. In this case, it is helpful to have a set of precomputed summaries of the library side effects (e.g., "strcmp" has none, "sprintf" overwrites a specific operand, "push_back" modifies its object...). Since libraries can be pretty big, this list can be pretty big.

DMS 通常可以解析和整个源代码库,构建符号表(因此它知道哪些标识符是本地/全局的及其精确类型),进行控制和本地数据流分析,构建每个函数的本地副作用"摘要,构建调用图和全局副作用,并进行全局指向分析,以适当的保守性提供这种计算使用"信息.

DMS in general can parse and entire source code base, build symbol tables (so it knows which identifiers are local/global and their precise type), do control and local dataflow analysis, build a local "sideeffects" summary per function, build a call graph and global side effects, and do a global points-to analysis, providing this "computation used" information with appropriate conservatism.

DMS 已被用于在 2600 万行代码的 C 代码系统上执行此计算(是的,这是一个非常大的计算;它需要 100Gb 虚拟机才能运行).我们没有实现死代码消除部分(该项目有另一个目的),但是一旦你有了这些数据,这很简单.DMS 已经通过更保守的分析(例如,不使用标识符的提及",这意味着对标识符的赋值已失效)对大型 Java 代码进行了死代码消除,这导致许多实际代码中的代码删除量惊人.

DMS has been used to do this computation on C code systems of 26 million lines of code (and yes, that's a really big computation; it takes 100Gb VM to run). We did not implement the dead code elimination part (the project had another purpose) but that is straightforward once you have this data. DMS has done the dead code elimination on large Java codes with a more conservative analysis (e.g., "no use mentions of an identifier" which means assignments to the identifier are dead) which causes a surprising amount of code removal in many real codes.

DMS 的 C++ 解析器目前可以构建符号表,并且可以对 C++98 进行控制流分析,而 C++11 已近在咫尺.我们仍然需要本地数据流分析,这是一些努力,但全局分析已经预先存在于 DMS 中,并且可以用于此效果.(如果您不介意进行更保守的分析,不使用标识符"很容易从符号表数据中获得.

DMS's C++ parser presently builds symbol tables and can do control flow analysis for C++98 with C++11 being close at hand. We still need local data flow analysis, which is some effort, but the global analyses already pre-exist in DMS and are available to be used for this effect. (The "no uses of an identifier" is easily available from the symbol table data, if you don't mind a more conservative analysis).

在实践中,您不希望该工具只是默默地撕掉东西;有些实际上可能是您无论如何都希望保留的计算.Java 工具所做的是产生两个结果:一个死计算列表,您可以检查它以确定您是否相信它,以及源代码的死代码删除版本.如果您相信死代码报告,则保留死代码删除版本;如果您看到一个您认为不应该死的死"计算,则修改代码使其不死并再次运行该工具.对于庞大的代码库,检查死代码报告本身可能是一种尝试;您"如何知道您团队中的其他人"是否不重视某些明显死掉的代码?(版本控制可以用来恢复,如果你搞砸了!)

In practice, you don't want the tool to just silently rip things out; some might actually be computations you wish to preserve anyway. What the Java tool does is produce two results: a list of dead computations which you can inspect to decide if you believe it, and a dead-code-removed version of the source code. If you believe the dead code report, you keep the dead-code-removed version; if you see a "dead" computation you think shouldn't be dead, you modify the code to make it not dead and run the tool again. With a big code base, inspecting the dead code report itself can be trying; how do "you" know if some apparantly dead code isn't valued by "somebody else" on your team?. (Version control can be used to recover if you goof!)

我们没有(也没有我知道的任何工具)处理的一个非常棘手的问题是存在条件编译的死代码".(Java 没有这个问题;C 确实有这个问题,C++ 系统则少得多).这真的很糟糕.想象一个条件,其中 arm 有某些副作用,而另一个 arm 有不同的副作用,或者另一种情况,其中一个由 GCC 的 C++ 编译器解释,另一个 arm 由 MS 解释,编译器对构造的作用存在分歧(是的,C++ 编译器在黑暗的角落里确实不同意).充其量我们可以非常保守.

A really tricky issue we do not (and no tool I know of) handle, is "dead code" in the presence of conditional compilation. (Java does not have this problem; C has it in spades, C++ systems much less). This can be truly nasty. Imagine a conditional in which arm has certain side effects and the other arm has different side effects, or another case in which one are is interpreted by GCC's C++ compiler, and the other arm interpreted by MS, and the compilers disagree on what the constructs do (yes, the C++ compilers do disagree in dark corners). At best we can be very conservative here.

CLANG 有一定的流量分析能力;以及一些进行源转换的能力,所以它可能会被迫这样做.我不知道它是否可以进行任何全局流/点分析.它似乎偏向于单个编译单元,因为它的主要用途是编译单个编译单元.

CLANG has some ability to do flow analysis; and some ability to do source transformations, so it might be coerced into doing this. I don't know if it can do any global flow/points-to analysis. It seems to have a bias towards single compilation units since its principal use is compiling a single compilation unit.

这篇关于从 C++ 文件中删除无用的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆