C#编译器优化 [英] C# Compiler Optimizations

查看:126
本文介绍了C#编译器优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不知道是否有人能向我解释,究竟是什么,编译器可能会做,我观察到一个简单的方法,在性能上如此极端的差异。

I'm wondering if someone can explain to me what exactly the compiler might be doing for me to observe such extreme differences in performance for a simple method.

 public static uint CalculateCheckSum(string str) { 
    char[] charArray = str.ToCharArray();
    uint checkSum = 0;
    foreach (char c in charArray) {
        checkSum += c;
    }
    return checkSum % 256;
 }



我和一位同事的工作做一些基准测试/优化的消息处理应用。这样做使用相同的输入字符串这个函数为1000万次迭代花了约25秒在Visual Studio 2012,但是当该项目采用优化代码选项,内置开启在7秒内对同样的10百万次迭代执行相同的代码。

I'm working with a colleague doing some benchmarking/optimizations for a message processing application. Doing 10 million iterations of this function using the same input string took about 25 seconds in Visual Studio 2012, however when the project was built using the "Optimize Code" option turned on the same code executed in 7 seconds for the same 10 million iterations.

我非常有兴趣了解编译器做幕后为我们能看到的代码貌似无辜块大于3倍的性能提升如这个。

I'm very interested to understand what the compiler is doing behind the scenes for us to be able to see a greater than 3x performance increase for a seemingly innocent block of code such as this.

按照要求,这里是一个说明什么我看到一个完整的控制台应用程序。

As requested, here is a complete Console application that demonstrates what I am seeing.

class Program
{
    public static uint CalculateCheckSum(string str)
    {
        char[] charArray = str.ToCharArray();
        uint checkSum = 0;
        foreach (char c in charArray)
        {
            checkSum += c;
        }
        return checkSum % 256;
    }

    static void Main(string[] args)
    {
        string stringToCount = "8=FIX.4.29=15135=D49=SFS56=TOMW34=11752=20101201-03:03:03.2321=DEMO=DG00121=155=IBM54=138=10040=160=20101201-03:03:03.23244=10.059=0100=ARCA10=246";
        Stopwatch stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < 10000000; i++)
        {
            CalculateCheckSum(stringToCount);
        }
        stopwatch.Stop();
        Console.WriteLine(stopwatch.Elapsed);
    }
}

运行与优化关闭我看到13秒调试,在我得到2秒。

Running in debug with Optimization off I see 13 seconds, on I get 2 seconds.

在释放与优化流失3.1秒和2.3秒。

Running in Release with Optimization off 3.1 seconds and on 2.3 seconds.

推荐答案

要再看一下的 C#编译器的为你做,你需要看一下IL。如果你想看看它如何影响即时编译代码,你需要看一下本机代码由斯科特·张伯伦描述。请注意,即时编译代码将非常根据处理器架构,CLR版本,这个过程是如何启动,以及其他可能的事情。

To look at what the C# compiler does for you, you need to look at the IL. If you want to see how that affects the JITted code, you'll need to look at the native code as described by Scott Chamberlain. Be aware that the JITted code will very based on processor architecture, CLR version, how the process was launched, and possibly other things.

我通常会与IL开始,然后的可能的看即时编译代码。

I would usually start with the IL, and then potentially look at the JITted code.

使用IL比较反汇编 CAN可能有点棘手,因为它包括每条指令的标签。这里是你的方法的两个版本,有和没有优化(使用C#5编译器)编译的,具有除去,使它们容易为比较多余的标签(和 NOP 说明)可能的:

Comparing the IL using ildasm can be slightly tricky, as it includes a label for each instruction. Here are two versions of your method compiled with and without optimization (using the C# 5 compiler), with extraneous labels (and nop instructions) removed to make them as easy to compare as possible:

优化

  .method public hidebysig static uint32 
          CalculateCheckSum(string str) cil managed
  {
    // Code size       46 (0x2e)
    .maxstack  2
    .locals init (char[] V_0,
             uint32 V_1,
             char V_2,
             char[] V_3,
             int32 V_4)
    ldarg.0
    callvirt   instance char[] [mscorlib]System.String::ToCharArray()
    stloc.0
    ldc.i4.0
    stloc.1
    ldloc.0
    stloc.3
    ldc.i4.0
    stloc.s    V_4
    br.s       loopcheck
  loopstart:
    ldloc.3
    ldloc.s    V_4
    ldelem.u2
    stloc.2
    ldloc.1
    ldloc.2
    add
    stloc.1
    ldloc.s    V_4
    ldc.i4.1
    add
    stloc.s    V_4
  loopcheck:
    ldloc.s    V_4
    ldloc.3
    ldlen
    conv.i4
    blt.s      loopstart
    ldloc.1
    ldc.i4     0x100
    rem.un
    ret
  } // end of method Program::CalculateCheckSum

未优化

  .method public hidebysig static uint32 
          CalculateCheckSum(string str) cil managed
  {
    // Code size       63 (0x3f)
    .maxstack  2
    .locals init (char[] V_0,
             uint32 V_1,
             char V_2,
             uint32 V_3,
             char[] V_4,
             int32 V_5,
             bool V_6)
    ldarg.0
    callvirt   instance char[] [mscorlib]System.String::ToCharArray()
    stloc.0
    ldc.i4.0
    stloc.1
    ldloc.0
    stloc.s    V_4
    ldc.i4.0
    stloc.s    V_5
    br.s       loopcheck

  loopstart:
    ldloc.s    V_4
    ldloc.s    V_5
    ldelem.u2
    stloc.2
    ldloc.1
    ldloc.2
    add
    stloc.1
    ldloc.s    V_5
    ldc.i4.1
    add
    stloc.s    V_5
  loopcheck:
    ldloc.s    V_5
    ldloc.s    V_4
    ldlen
    conv.i4
    clt
    stloc.s    V_6
    ldloc.s    V_6
    brtrue.s   loopstart

    ldloc.1
    ldc.i4     0x100
    rem.un
    stloc.3
    br.s       methodend

  methodend:
    ldloc.3
    ret
  }

注意点:


  • 优化的版本使用较少的本地人。这可能使JIT以更有效地使用的寄存器。

  • 优化的版本使用 blt.s ,而不是 CLT 然后按 brtrue.s 检查是否不走轮循环再次(这是额外的当地人的原因之一)时。

  • 未优化的版本使用一个额外的本地返回前返回值存储,大概是为了使调试更加容易。

  • 未优化的版本有一个无条件分支之前它返回。

  • 优化的版本是短,但我怀疑它的足够短,内联,所以我怀疑这是无关紧要的。

  • The optimized version uses fewer locals. This may allow the JIT to use registers more effectively.
  • The optimized version uses blt.s rather than clt followed by brtrue.s when checking whether or not to go round the loop again (this is the reason for one of the extra locals).
  • The unoptimized version uses an additional local to store the return value before returning, presumably to make debugging easier.
  • The unoptimized version has an unconditional branch just before it returns.
  • The optimized version is shorter, but I doubt that it's short enough to be inlined, so I suspect that's irrelevant.

这篇关于C#编译器优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆