调试版本和发布版本之间的性能差异 [英] Performance differences between debug and release builds

查看:34
本文介绍了调试版本和发布版本之间的性能差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须承认,通常我没有费心在程序中的 DebugRelease 配置之间切换,而且我通常选择使用 调试配置,即使程序实际部署在客户处也是如此.

I must admit, that usually I haven't bothered switching between the Debug and Release configurations in my program, and I have usually opted to go for the Debug configuration, even when the programs are actually deployed at the customers place.

据我所知,如果不手动更改,这些配置之间的唯一区别是 Debug 具有 DEBUG定义常量,发布检查优化代码.

As far as I know, the only difference between these configurations if you don't change it manually is that Debug have the DEBUG constant defined, and Release have the Optimize code checked of.

所以我的问题实际上是双重的:

So my questions is actually twofold:

  1. 这两种配置之间是否有很大的性能差异.是否有任何特定类型的代码会导致此处的性能出现巨大差异,或者实际上没有那么重要?

  1. Are there much performance differences between these two configurations. Are there any specific type of code that will cause big differences in performance here, or is it actually not that important?

是否有任何类型的代码可以在 Debug 配置下正常运行而在 Release 配置下可能会失败,或者您能否确定代码是在 Debug 配置下测试和工作正常,在发布配置下也能正常工作.

Are there any type of code that will run fine under the Debug configuration that might fail under Release configuration, or can you be certain that code that is tested and working fine under the Debug configuration will also work fine under Release configuration.

推荐答案

C# 编译器本身不会对 Release 版本中发出的 IL 进行大量更改.值得注意的是,它不再发出允许您在花括号上设置断点的 NOP 操作码.最重要的是内置在 JIT 编译器中的优化器.我知道它进行了以下优化:

The C# compiler itself doesn't alter the emitted IL a great deal in the Release build. Notable is that it no longer emits the NOP opcodes that allow you to set a breakpoint on a curly brace. The big one is the optimizer that's built into the JIT compiler. I know it makes the following optimizations:

  • 方法内联.方法调用被注入方法的代码所取代.这是一个很大的问题,它使属性访问器基本上免费.

  • Method inlining. A method call is replaced by the injecting the code of the method. This is a big one, it makes property accessors essentially free.

CPU 寄存器分配.局部变量和方法参数可以一直存储在 CPU 寄存器中,而不会(或不太频繁)存储回堆栈帧.这是一个很大的问题,值得注意的是使调试优化的代码变得如此困难.并赋予 volatile 关键字一个含义.

CPU register allocation. Local variables and method arguments can stay stored in a CPU register without ever (or less frequently) being stored back to the stack frame. This is a big one, notable for making debugging optimized code so difficult. And giving the volatile keyword a meaning.

消除数组索引检查.处理数组时的一项重要优化(所有 .NET 集合类在内部使用数组).当 JIT 编译器可以验证循环永远不会索引数组越界时,它将消除索引检查.大的.

Array index checking elimination. An important optimization when working with arrays (all .NET collection classes use an array internally). When the JIT compiler can verify that a loop never indexes an array out of bounds then it will eliminate the index check. Big one.

循环展开.通过在主体中最多重复 4 次代码并减少循环次数,可以改进具有小主体的循环.降低分支成本并改进处理器的超标量执行选项.

Loop unrolling. Loops with small bodies are improved by repeating the code up to 4 times in the body and looping less. Reduces the branch cost and improves the processor's super-scalar execution options.

死代码消除.像 if (false) {/.../} 这样的语句被完全消除.这可能是由于不断折叠和内联而发生的.其他情况是 JIT 编译器可以确定代码没有可能的副作用.这种优化使分析代码变得如此棘手.

Dead code elimination. A statement like if (false) { /.../ } gets completely eliminated. This can occur due to constant folding and inlining. Other cases is where the JIT compiler can determine that the code has no possible side-effect. This optimization is what makes profiling code so tricky.

代码提升.循环内不受循环影响的代码可以移出循环.C 编译器的优化器将花费更多的时间来寻找提升的机会.然而,由于所需的数据流分析和抖动无法承受时间,因此这是一项昂贵的优化,因此只能提升明显的情况.迫使 .NET 程序员编写更好的源代码并提升自己.

Code hoisting. Code inside a loop that is not affected by the loop can be moved out of the loop. The optimizer of a C compiler will spend a lot more time on finding opportunities to hoist. It is however an expensive optimization due to the required data flow analysis and the jitter can't afford the time so only hoists obvious cases. Forcing .NET programmers to write better source code and hoist themselves.

常见的子表达式消除.x = y + 4;z = y + 4;变成 z = x;在像 dest[ix+1] = src[ix+1]; 这样的语句中很常见.在不引入辅助变量的情况下为可读性而编写.无需牺牲可读性.

Common sub-expression elimination. x = y + 4; z = y + 4; becomes z = x; Pretty common in statements like dest[ix+1] = src[ix+1]; written for readability without introducing a helper variable. No need to compromise readability.

不断折叠.x = 1 + 2;变成 x = 3;这个简单的例子很早就被编译器捕获了,但在其他优化使之成为可能的 JIT 时间发生.

Constant folding. x = 1 + 2; becomes x = 3; This simple example is caught early by the compiler, but happens at JIT time when other optimizations make this possible.

复制传播.x = 一个;y = x;变成 y = a;这有助于寄存器分配器做出更好的决策.这是 x86 抖动中的一个大问题,因为它几乎没有可用的寄存器.让它选择正确的对性能至关重要.

Copy propagation. x = a; y = x; becomes y = a; This helps the register allocator make better decisions. It is a big deal in the x86 jitter because it has few registers to work with. Having it select the right ones is critical to perf.

这些是非常重要的优化,它们可以很大产生很大的不同,例如,当您分析应用的调试版本并将其与发布版本进行比较时.只有当代码在您的关键路径上时才真正重要,您编写的 5% 到 10% 的代码实际上会影响您的程序的性能.JIT 优化器不够聪明,无法预先知道什么是关键的,它只能对所有代码应用将其转换为 11"拨号.

These are very important optimizations that can make a great deal of difference when, for example, you profile the Debug build of your app and compare it to the Release build. That only really matters though when the code is on your critical path, the 5 to 10% of the code you write that actually affects the perf of your program. The JIT optimizer isn't smart enough to know up front what is critical, it can only apply the "turn it to eleven" dial for all the code.

这些优化对程序执行时间的有效结果通常会受到在别处运行的代码的影响.读取文件、执行 dbase 查询等.使 JIT 优化器所做的工作完全不可见.不过没关系:)

The effective result of these optimizations on your program's execution time is often affected by code that runs elsewhere. Reading a file, executing a dbase query, etc. Making the work the JIT optimizer does completely invisible. It doesn't mind though :)

JIT 优化器是非常可靠的代码,主要是因为它已经接受了数百万次的测试.在您的程序的 Release 构建版本中出现问题的情况极为罕见.然而它确实发生了.x64 和 x86 抖动都存在结构问题.x86 抖动在浮点一致性方面存在问题,当浮点计算的中间体以 80 位精度保存在 FPU 寄存器中而不是在刷新到内存时被截断时,会产生细微不同的结果.

The JIT optimizer is pretty reliable code, mostly because it has been put to the test millions of times. It is extremely rare to have problems in the Release build version of your program. It does happen however. Both the x64 and the x86 jitters have had problems with structs. The x86 jitter has trouble with floating point consistency, producing subtly different results when the intermediates of a floating point calculation are kept in a FPU register at 80-bit precision instead of getting truncated when flushed to memory.

这篇关于调试版本和发布版本之间的性能差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆