调试和发布版本之间的性能差异 [英] Performance differences between debug and release builds

查看:116
本文介绍了调试和发布版本之间的性能差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须承认,通常我没有打扰我的程序中的调试发布配置之间的切换,我通常选择去>调试配置,即使程序实际部署在客户位置。



据我所知,这些配置之间的唯一区别如果你不要手动更改它是调试具有定义的 DEBUG 常量,而 请查看优化代码



所以我的问题实际上是双重的:


  1. 这两种配置之间的性能差异很大。是否有任何特定类型的代码会在这里导致性能差异很大,或者实际上并不重要?


  2. 是否有任何类型的代码将在调试配置下运行正常,可能会在发布配置下失败,或者您可以确定在调试下测试并正常工作的代码配置在发布配置下也可以正常工作。



解决方案

C#不会在发布版本中改变发出的IL。值得注意的是,它不再发出NOP操作码,允许您在大括号上设置断点。大的是优化器,内置于JIT编译器中。我知道它会进行以下优化:




  • 方法内联。方法调用由注入方法的代码替代。这是一个大的,它使得属性访问器基本上是免费的。


  • CPU寄存器分配。本地变量和方法参数可以保存在CPU寄存器中,而不会(或更少)被存储回堆栈帧。这是一个很大的,值得注意的是使调试优化的代码如此困难。并将挥发性关键字赋予一个含义。


  • 数组索引检查消除。使用数组时的一个重要优化(所有.NET集合类都使用内部的数组)。当JIT编译器可以验证一个循环从来没有将数组从一个数组移出边界时,它将消除索引检查。大一个。


  • 循环展开。通过在身体中重复代码高达4次,循环更少,可改善小身体循环。降低分支成本,提高处理器的超标量执行选项。


  • 消除代码。像if(false){/.../}这样的语句被完全消除。这可能由于恒定的折叠和内联而发生。其他情况是JIT编译器可以确定代码没有可能的副作用。这个优化是使分析代码如此棘手。


  • 代码提升。不受循环影响的循环内的代码可以移出循环。 C编译器的优化器将花更多的时间寻找提升机会。然而,由于所需的数据流分析,抖动无法承受时间,因此只能提升明显的情况,这是昂贵的优化。强制.NET程序员编写更好的源代码并提升自己。


  • 消除通用子表达式。 x = y + 4; z = y + 4;变为z = x;在诸如dest [ix + 1] = src [ix + 1]的语句中很常见为了可读性而编写,而不引入帮助变量。不需要妥协可读性。


  • 常量折叠。 x = 1 + 2;变为x = 3;这个简单的例子很早就被编译器所捕获,但在其他优化使JIT可以实现的时候发生。


  • 复制传播。 x = a; y = x;成为y = a;这有助于注册分配器做出更好的决策。这是x86抖动中的一大难题,因为它具有很少的寄存器可以使用。选择正确的对于perf是至关重要的。




这些是非常重要的优化,可以使一个伟大的不同之处在于,例如,您可以配置Debug构建应用程序并将其与Release版本进行比较。这只是真的很重要,当代码在你的关键路径上时,你写的这个代码的5到10%实际上会影响程序的执行。 JIT优化器不够聪明,无法知道什么是至关重要的,它只能将所有代码的转为十一拨号。



这些优化对程序执行时间的有效结果通常受其他地方运行的代码的影响。读取文件,执行dbase查询等。使JIT优化器完成工作完全不可见。它不介意:)



JIT优化器是非常可靠的代码,主要是因为它已经被测试了数百万次。在程序的版本版本中出现问题是非常少见的。但是确实发生了。 x64和x86抖动都有结构问题。当浮点计算的中间体以80位精度保存在FPU寄存器中时,x86抖动对浮点一致性有困难,产生微妙的不同结果,而不是在刷新到内存时被截断。


I must admit, that usually I haven't bothered switching between the Debug and Release configurations in my program, and I have usually opted to go for the Debug configuration, even when the programs are actually deployed at the customers place.

As far as I know, the only difference between these configurations if you don't change it manually is that Debug have the DEBUG constant defined, and Release have the Optimize code checked of.

So my questions is actually twofold:

  1. Are there much performance differences between these two configurations. Are there any specific type of code that will cause big differences in performance here, or is it actually not that important?

  2. Are there any type of code that will run fine under the Debug configuration that might fail under Release configuration, or can you be certain that code that is tested and working fine under the Debug configuration will also work fine under Release configuration.

解决方案

The C# compiler itself doesn't alter the emitted IL a great deal in the Release build. Notable is that it no longer emits the NOP opcodes that allow you to set a breakpoint on a curly brace. The big one is the optimizer that's built into the JIT compiler. I know it makes the following optimizations:

  • Method inlining. A method call is replaced by the injecting the code of the method. This is a big one, it makes property accessors essentially free.

  • CPU register allocation. Local variables and method arguments can stay stored in a CPU register without ever (or less frequently) being stored back to the stack frame. This is a big one, notable for making debugging optimized code so difficult. And giving the volatile keyword a meaning.

  • Array index checking elimination. An important optimization when working with arrays (all .NET collection classes use an array internally). When the JIT compiler can verify that a loop never indexes an array out of bounds then it will eliminate the index check. Big one.

  • Loop unrolling. Loops with small bodies are improved by repeating the code up to 4 times in the body and looping less. Reduces the branch cost and improves the processor's super-scalar execution options.

  • Dead code elimination. A statement like if (false) { /.../ } gets completely eliminated. This can occur due to constant folding and inlining. Other cases is where the JIT compiler can determine that the code has no possible side-effect. This optimization is what makes profiling code so tricky.

  • Code hoisting. Code inside a loop that is not affected by the loop can be moved out of the loop. The optimizer of a C compiler will spend a lot more time on finding opportunities to hoist. It is however an expensive optimization due to the required data flow analysis and the jitter can't afford the time so only hoists obvious cases. Forcing .NET programmers to write better source code and hoist themselves.

  • Common sub-expression elimination. x = y + 4; z = y + 4; becomes z = x; Pretty common in statements like dest[ix+1] = src[ix+1]; written for readability without introducing a helper variable. No need to compromise readability.

  • Constant folding. x = 1 + 2; becomes x = 3; This simple example is caught early by the compiler, but happens at JIT time when other optimizations make this possible.

  • Copy propagation. x = a; y = x; becomes y = a; This helps the register allocator make better decisions. It is a big deal in the x86 jitter because it has few registers to work with. Having it select the right ones is critical to perf.

These are very important optimizations that can make a great deal of difference when, for example, you profile the Debug build of your app and compare it to the Release build. That only really matters though when the code is on your critical path, the 5 to 10% of the code you write that actually affects the perf of your program. The JIT optimizer isn't smart enough to know up front what is critical, it can only apply the "turn it to eleven" dial for all the code.

The effective result of these optimizations on your program's execution time is often affected by code that runs elsewhere. Reading a file, executing a dbase query, etc. Making the work the JIT optimizer does completely invisible. It doesn't mind though :)

The JIT optimizer is pretty reliable code, mostly because it has been put to the test millions of times. It is extremely rare to have problems in the Release build version of your program. It does happen however. Both the x64 and the x86 jitters have had problems with structs. The x86 jitter has trouble with floating point consistency, producing subtly different results when the intermediates of a floating point calculation are kept in a FPU register at 80-bit precision instead of getting truncated when flushed to memory.

这篇关于调试和发布版本之间的性能差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆