调试和发布版本之间的性能差异 [英] Performance differences between debug and release builds

查看:291
本文介绍了调试和发布版本之间的性能差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须承认,通常我没有的调试的焦虑不安开关的发布的在我的计划配置,而我往往选择去为调试的配置,即使当程序被实际部署在客户处。

据我所知,这些配置之间的唯一区别,如果你不改变它手动是的调试的有 DEBUG 定义的常量,而发布的拥有的优化code 的检查了。

所以我的问题实际上是双重的:

  1. 是否有这两种配置之间没有太大的性能差异。是否有对code的具体类型,这将导致在这里的表现差异很大,或者是它实际上不是那么重要吗?

  2. 是否有任何类型的code,将在调试的配置下,可能无法运行良好的发布的配置,或者你能肯定,被测试并工作正常下的调试的配置也将在发布配置做工精细code。

解决方案

C#编译器本身并没有改变发射IL在发布版本很多。值得注意的是,它不再发出的NOP运算codeS,让您在大括号设置一个断点。最大的一种是内置于JIT编译器优化。我知道这让下面的优化:

  • 方法内联。一个方法调用替换为注射方法的code。这是一个大的,它使属性访问器基本上是免费的。

  • CPU寄存器分配。局部变量和方法参数可以保持存储在CPU寄存器而没有(或较少)被存回堆栈帧。这是一个大的,值得注意的是使调试优化code如此困难。并给予的挥发性的关键字意义。

  • 数组索引检查消除。使用数组时,一个重要的优化(所有的.NET集合类在内部使用数组)。当JIT编译器可以验证一个循环永远不会索引数组越界那么它将消除索引检查。大的。

  • 循环展开。短环路(最多4个)小体被重复code的循环排出体外。避免了分支误prediction处罚。

  • 死code淘汰。声明一样,如果(假){} /.../被完全消除。这可能是由于不断的折叠和内联。其它情况是其中JIT编译器可以确定code的无可能的副作用。这种优化是什么使分析code如此棘手。

  • code吊装。在循环中code,它不受循环可以移出循环。

  • 公共子-EX pression消除。 X = Y + 4; Z = Y + 4;成为Z = X;

  • 恒折叠。 X = 1 + 2;变为x = 3;这个简单的例子是早期由编译器捕获,但发生在JIT时候,其他的优化​​使之成为可能。

  • 复制传播。 X = A; Y = X;成为Y = A;这有助于寄存器分配做出更好的决策。这是一个大问题在x86抖动,因为它有这么几个寄存器的工作。有了它选择是正确的关键是要逆足

这些都是可以做出非常重要的一个优化的伟大的的差异性时,例如,您配置您的应用程序的调试版本,并将其与发布版本。这只有真正重要的,虽然当code是你的关键路径上,在5〜10%$ C $的c您写的实际上的影响到你的程序的PERF。 JIT的优化是不够聪明,知道前面是什么重要的,它只能适用把它给十一拨打所有code。

在你的程序的执行时间这些优化的有效结果往往受别处运行code。读文件,执行质数据库查询等,使工作中的JIT优化器完全看不见。它不介意,但:)

的JIT优化为pretty的可靠code,主要是因为它已经被考验百万次。这是极为罕见的在你的程序的发布版本的版本问题。然而,这确实发生了。无论是x64和x86的紧张情绪有问题的结构。在x86抖动与浮点一致性,产生微妙的不同结果时,浮点运算的中间体,在80位precision保存在FPU寄存器,而不是截去时刷新到内存的麻烦。

I must admit, that usually I haven't bothered switching between the Debug and Release configurations in my program, and I have usually opted to go for the Debug configuration, even when the programs are actually deployed at the customers place.

As far as I know, the only difference between these configurations if you don't change it manually is that Debug have the DEBUG constant defined, and Release have the Optimize code checked of.

So my questions is actually twofold:

  1. Are there much performance differences between these two configurations. Are there any specific type of code that will cause big differences in performance here, or is it actually not that important?

  2. Are there any type of code that will run fine under the Debug configuration that might fail under Release configuration, or can you be certain that code that is tested and working fine under the Debug configuration will also work fine under Release configuration.

解决方案

The C# compiler itself doesn't alter the emitted IL a great deal in the Release build. Notable is that it no longer emits the NOP opcodes that allow you to set a breakpoint on a curly brace. The big one is the optimizer that's built into the JIT compiler. I know it makes the following optimizations:

  • Method inlining. A method call is replaced by the injecting the code of the method. This is a big one, it makes property accessors essentially free.

  • CPU register allocation. Local variables and method arguments can stay stored in a CPU register without ever (or less frequently) being stored back to the stack frame. This is a big one, notable for making debugging optimized code so difficult. And giving the volatile keyword a meaning.

  • Array index checking elimination. An important optimization when working with arrays (all .NET collection classes use an array internally). When the JIT compiler can verify that a loop never indexes an array out of bounds then it will eliminate the index check. Big one.

  • Loop unrolling. Short loops (up to 4) with small bodies are eliminated by repeating the code in the loop body. Avoids the branch misprediction penalty.

  • Dead code elimination. A statement like if (false) { /.../ } gets completely eliminated. This can occur due to constant folding and inlining. Other cases is where the JIT compiler can determine that the code has no possible side-effect. This optimization is what makes profiling code so tricky.

  • Code hoisting. Code inside a loop that is not affected by the loop can be moved out of the loop.

  • Common sub-expression elimination. x = y + 4; z = y + 4; becomes z = x;

  • Constant folding. x = 1 + 2; becomes x = 3; This simple example is caught early by the compiler, but happens at JIT time when other optimizations make this possible.

  • Copy propagation. x = a; y = x; becomes y = a; This helps the register allocator make better decisions. It is a big deal in the x86 jitter because it has so few registers to work with. Having it select the right ones is critical to perf.

These are very important optimizations that can make a great deal of difference when, for example, you profile the Debug build of your app and compare it to the Release build. That only really matters though when the code is on your critical path, the 5 to 10% of the code you write that actually affects the perf of your program. The JIT optimizer isn't smart enough to know up front what is critical, it can only apply the "turn it to eleven" dial for all the code.

The effective result of these optimizations on your program's execution time is often affected by code that runs elsewhere. Reading a file, executing a dbase query, etc. Making the work the JIT optimizer does completely invisible. It doesn't mind though :)

The JIT optimizer is pretty reliable code, mostly because it has been put to the test millions of times. It is extremely rare to have problems in the Release build version of your program. It does happen however. Both the x64 and the x86 jitters have had problems with structs. The x86 jitter has trouble with floating point consistency, producing subtly different results when the intermediates of a floating point calculation are kept in a FPU register at 80-bit precision instead of getting truncated when flushed to memory.

这篇关于调试和发布版本之间的性能差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆