为什么拉姆达快于IL注入动态的方法? [英] Why is lambda faster than IL injected dynamic method?

查看:370
本文介绍了为什么拉姆达快于IL注入动态的方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚建立动态的方法 - 见下文(感谢老乡所以用户)。看来,函数功能创建为动态方法与IL注射2倍比拉姆达慢

任何人知道到底为什么?

(编辑:这是建成64版在VS2010请从控制台不能从Visual Studio内部的F5运行它。)

 类节目
{
    静态无效的主要(字串[] args)
    {
        变种MUL1 = IL_EmbedConst(5);
        VAR解析度= MUL1(4);

        Console.WriteLine(RES);

        变种MUL2 = EmbedConstFunc(5);
        RES = MUL2(4);

        Console.WriteLine(RES);

        双D,ACC = 0;

        秒表SW =新的秒表();

        为(中间体K = 0; K&小于10; k ++)
        {
            长的时间1;

            sw.Restart();

            的for(int i = 0; I<千万;我++)
            {
                D = MUL2(ⅰ);
                ACC + = D;
            }

            sw.Stop();

            时间1 = sw.ElapsedMilliseconds;

            sw.Restart();

            的for(int i = 0; I<千万;我++)
            {
                D = MUL1(ⅰ);
                ACC + = D;
            }

            sw.Stop();

            Console.WriteLine({0,6} {1,6},时间1,sw.ElapsedMilliseconds);
        }

        Console.WriteLine(\ N {0} ... \ N,ACC);
        到Console.ReadLine();
    }

    静态函数功能:LT; INT,INT> IL_EmbedConst(int b)在
    {
        VAR方法=新的DynamicMethod的(EmbedConst的typeof(INT),新的[] {typeof运算(INT)});

        变种IL = method.GetILGenerator();

        il.Emit(欧普codes.Ldarg_0);
        il.Emit(欧普codes.Ldc_I4,B);
        il.Emit(欧普codes.Mul);
        il.Emit(欧普codes.Ret);

        返程(Func键< INT,INT>)method.CreateDelegate(typeof运算(Func键< INT,INT>));
    }

    静态函数功能:LT; INT,INT> EmbedConstFunc(int b)在
    {
        返回=> A * B;
    }
}
 

下面是输出(用于I7 920)

  20
20

25 51
25 51
24 51
24 51
24 51
25 51
25 51
25 51
24 51
24 51

4.9999995E + 15 ...
 

=============================================== =============================

编辑编辑编辑编辑

下面是 dhtorpe 是正确的证明 - 较为复杂的lambda将失去其优势。 code来证明这一点(这表明,LAMBDA有完全相同的性能与IL注射):

 类节目
{
    静态无效的主要(字串[] args)
    {
        变种MUL1 = IL_EmbedConst(5);
        双解析度= MUL1(4,6);

        Console.WriteLine(RES);

        变种MUL2 = EmbedConstFunc(5);
        RES = MUL2(4,6);

        Console.WriteLine(RES);

        双D,ACC = 0;

        秒表SW =新的秒表();

        为(中间体K = 0; K&小于10; k ++)
        {
            长的时间1;

            sw.Restart();

            的for(int i = 0; I<千万;我++)
            {
                D = MUL2(I,I + 1);
                ACC + = D;
            }

            sw.Stop();

            时间1 = sw.ElapsedMilliseconds;

            sw.Restart();

            的for(int i = 0; I<千万;我++)
            {
                D = MUL1(I,I + 1);
                ACC + = D;
            }

            sw.Stop();

            Console.WriteLine({0,6} {1,6},时间1,sw.ElapsedMilliseconds);
        }

        Console.WriteLine(\ N {0} ... \ N,ACC);
        到Console.ReadLine();
    }

    静态函数功能:LT; INT,INT,双> IL_EmbedConst(int b)在
    {
        VAR方法=新的DynamicMethod的(EmbedConstIL的typeof(双),新的[] {typeof运算(INT)的typeof(INT)});

        VAR数= typeof运算(数学).GetMethod(日志,新类型[] {typeof运算(双)});

        变种IL = method.GetILGenerator();

        il.Emit(欧普codes.Ldarg_0);
        il.Emit(欧普codes.Ldc_I4,B);
        il.Emit(欧普codes.Mul);
        il.Emit(欧普codes.Conv_R8);

        il.Emit(欧普codes.Ldarg_1);
        il.Emit(欧普codes.Ldc_I4,B);
        il.Emit(欧普codes.Mul);
        il.Emit(欧普codes.Conv_R8);

        il.Emit(欧普codes.Call,日志);

        il.Emit(欧普codes.Sub);

        il.Emit(欧普codes.Ret);

        返程(Func键< INT,INT,双>)method.CreateDelegate(typeof运算(Func键< INT,INT,双>));
    }

    静态函数功能:LT; INT,INT,双> EmbedConstFunc(int b)在
    {
        回报(A,Z)=> A * B  - 将Math.log(Z * B);
    }
}
 

解决方案

由于性能上的差异,而不附加调试器在发布模式下运行时才会存在,唯一的解释我能想到的是,JIT编译器能够使天然code优化为拉姆达前pression它不能执行对所发射的IL动态功能。

编译发布模式(优化),并没有附加调试器中运行,在lambda是一致的2倍比产生IL动态方法更快。

运行同一版本模式与连接到进程调试器优化的建立滴拉姆达性能相当或差于产生IL动态方法。

这两个段之间的唯一区别是在JIT的行为。当一个进程正在调试,JIT编译器SUP presses一批本土code代优化,以preserve原生指令IL指令源$ C ​​$ C行号映射和其他相关的,这将是通过积极的本机指令优化丢弃。

一个编译器只能应用特例优化当输入前pression图(在这种情况下,IL $ C $三)匹配某些非常特殊的图案和条件。 JIT编译器显然有拉姆达EX pression IL code模式的专业知识,并发出不同code的lambda表达式比正常IL code。

这很可能是您的IL指令不完全匹配,导致JIT编译器优化的λEX pression模式。例如,您的IL指令EN $ C C B值作为一个内联恒$,而类似的lambda EX pression加载场从内部捕获的变量对象实例。即使你产生IL是模仿C#编译器的捕获场模式产生的lambda EX pression IL,它仍然可能不是足够接近接受相同的JIT待遇拉姆达EX pression。

如上所述评价,这很可能是由于拉姆达内联,以消除调用/返回开销。如果是这样的话,我希望看到这种性能差异在更复杂的lambda EX pressions消失,因为内联通常只保留给最简单的当然pressions。

I just built dynamic method - see below (thanks to the fellow SO users). It appears that the Func created as a dynamic method with IL injection 2x slower than the lambda.

Anyone knows why exactly?

(EDIT : this was built as Release x64 in VS2010. Please run it from console not from inside Visual Studio F5.)

class Program
{
    static void Main(string[] args)
    {
        var mul1 = IL_EmbedConst(5);
        var res = mul1(4);

        Console.WriteLine(res);

        var mul2 = EmbedConstFunc(5);
        res = mul2(4);

        Console.WriteLine(res);

        double d, acc = 0;

        Stopwatch sw = new Stopwatch();

        for (int k = 0; k < 10; k++)
        {
            long time1;

            sw.Restart();

            for (int i = 0; i < 10000000; i++)
            {
                d = mul2(i);
                acc += d;
            }

            sw.Stop();

            time1 = sw.ElapsedMilliseconds;

            sw.Restart();

            for (int i = 0; i < 10000000; i++)
            {
                d = mul1(i);
                acc += d;
            }

            sw.Stop();

            Console.WriteLine("{0,6} {1,6}", time1, sw.ElapsedMilliseconds);
        }

        Console.WriteLine("\n{0}...\n", acc);
        Console.ReadLine();
    }

    static Func<int, int> IL_EmbedConst(int b)
    {
        var method = new DynamicMethod("EmbedConst", typeof(int), new[] { typeof(int) } );

        var il = method.GetILGenerator();

        il.Emit(OpCodes.Ldarg_0);
        il.Emit(OpCodes.Ldc_I4, b);
        il.Emit(OpCodes.Mul);
        il.Emit(OpCodes.Ret);

        return (Func<int, int>)method.CreateDelegate(typeof(Func<int, int>));
    }

    static Func<int, int> EmbedConstFunc(int b)
    {
        return a => a * b;
    }
}

Here is the output (for i7 920)

20
20

25     51
25     51
24     51
24     51
24     51
25     51
25     51
25     51
24     51
24     51

4.9999995E+15...

============================================================================

EDIT EDIT EDIT EDIT

Here is the proof of that dhtorpe was right - more complex lambda will lose its advantage. Code to prove it (this demonstrate that Lambda has exactly the same performance with IL injection):

class Program
{
    static void Main(string[] args)
    {
        var mul1 = IL_EmbedConst(5);
        double res = mul1(4,6);

        Console.WriteLine(res);

        var mul2 = EmbedConstFunc(5);
        res = mul2(4,6);

        Console.WriteLine(res);

        double d, acc = 0;

        Stopwatch sw = new Stopwatch();

        for (int k = 0; k < 10; k++)
        {
            long time1;

            sw.Restart();

            for (int i = 0; i < 10000000; i++)
            {
                d = mul2(i, i+1);
                acc += d;
            }

            sw.Stop();

            time1 = sw.ElapsedMilliseconds;

            sw.Restart();

            for (int i = 0; i < 10000000; i++)
            {
                d = mul1(i, i + 1);
                acc += d;
            }

            sw.Stop();

            Console.WriteLine("{0,6} {1,6}", time1, sw.ElapsedMilliseconds);
        }

        Console.WriteLine("\n{0}...\n", acc);
        Console.ReadLine();
    }

    static Func<int, int, double> IL_EmbedConst(int b)
    {
        var method = new DynamicMethod("EmbedConstIL", typeof(double), new[] { typeof(int), typeof(int) });

        var log = typeof(Math).GetMethod("Log", new Type[] { typeof(double) });

        var il = method.GetILGenerator();

        il.Emit(OpCodes.Ldarg_0);
        il.Emit(OpCodes.Ldc_I4, b);
        il.Emit(OpCodes.Mul);
        il.Emit(OpCodes.Conv_R8);

        il.Emit(OpCodes.Ldarg_1);
        il.Emit(OpCodes.Ldc_I4, b);
        il.Emit(OpCodes.Mul);
        il.Emit(OpCodes.Conv_R8);

        il.Emit(OpCodes.Call, log);

        il.Emit(OpCodes.Sub);

        il.Emit(OpCodes.Ret);

        return (Func<int, int, double>)method.CreateDelegate(typeof(Func<int, int, double>));
    }

    static Func<int, int, double> EmbedConstFunc(int b)
    {
        return (a, z) => a * b - Math.Log(z * b);
    }
} 

解决方案

Given that the performance difference exists only when running in release mode without a debugger attached, the only explanation I can think of is that the JIT compiler is able to make native code optimizations for the lambda expression that it is not able to perform for the emitted IL dynamic function.

Compiling for release mode (optimizations on) and running without the debugger attached, the lambda is consistently 2x faster than the generated IL dynamic method.

Running the same release-mode optimized build with a debugger attached to the process drops the lambda performance to comparable or worse than the generated IL dynamic method.

The only difference between these two runs is in the behavior of the JIT. When a process is being debugged, the JIT compiler suppresses a number of native code gen optimizations to preserve native instruction to IL instruction to source code line number mappings and other correlations that would be trashed by aggressive native instruction optimizations.

A compiler can only apply special case optimizations when the input expression graph (in this case, IL code) matches certain very specific patterns and conditions. The JIT compiler clearly has special knowledge of the lambda expression IL code pattern and is emitting different code for lambdas than for "normal" IL code.

It is quite possible that your IL instructions do not exactly match the pattern that causes the JIT compiler to optimize the lambda expression. For example, your IL instructions encode the B value as an inline constant, whereas the analogous lambda expression loads a field from an internal captured variable object instance. Even if your generated IL were to mimic the captured field pattern of the C# compiler generated lambda expression IL, it still might not be "close enough" to receive the same JIT treatment as the lambda expression.

As mentioned in the comments, this may well be due to inlining of the lambda to eliminate the call/return overhead. If this is the case, I would expect to see this difference in performance disappear in more complex lambda expressions, since inlining is usually reserved for only the simplest of expressions.

这篇关于为什么拉姆达快于IL注入动态的方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆