通用VS在C#中没有泛型的性能 [英] Generic vs not-generic performance in C#

查看:83
本文介绍了通用VS在C#中没有泛型的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经写了两等效方法:

I've written two equivalent methods:

static bool F<T>(T a, T b) where T : class
{
    return a == b;
}

static bool F2(A a, A b)
{
    return a == b;
}



时差:

00:00:00.0380022

00:00:00.0170009

Time difference:
00:00:00.0380022
00:00:00.0170009

代码来进行测试:

var a = new A();
for (int i = 0; i < 100000000; i++)
    F<A>(a, a);
Console.WriteLine(DateTime.Now - dt);

dt = DateTime.Now;
for (int i = 0; i < 100000000; i++)
    F2(a, a);
Console.WriteLine(DateTime.Now - dt);



有谁知道为什么吗?

Does anyone know why?

在下方的评论, DTB * 显示 CIL

IL for F2: ldarg.0, ldarg.1, ceq, ret. IL for F<T>: ldarg.0, box !!T, ldarg.1, box !!T, ceq, ret.



我认为这是我的问题的答案,但我可以用什么魔法否认拳击?

I think it's the answer for my question, but what magic can I use to deny boxing?

接下来,我用代码的 Psilon

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

namespace ConsoleApplication58
{
    internal class Program
    {
        private class A
        {

        }

        private static bool F<T>(T a, T b) where T : class
        {
            return a == b;
        }

        private static bool F2(A a, A b)
        {
            return a == b;
        }

        private static void Main()
        {
            const int rounds = 100, n = 10000000;
            var a = new A();
            var fList = new List<TimeSpan>();
            var f2List = new List<TimeSpan>();
            for (int i = 0; i < rounds; i++)
            {
                // Test generic
                GCClear();
                bool res;
                var sw = new Stopwatch();
                sw.Start();
                for (int j = 0; j < n; j++)
                {
                    res = F(a, a);
                }
                sw.Stop();
                fList.Add(sw.Elapsed);

                // Test not-generic
                GCClear();
                bool res2;
                var sw2 = new Stopwatch();
                sw2.Start();
                for (int j = 0; j < n; j++)
                {
                    res2 = F2(a, a);
                }
                sw2.Stop();
                f2List.Add(sw2.Elapsed);
            }
            double f1AverageTicks = fList.Average(ts => ts.Ticks);
            Console.WriteLine("Elapsed for F = {0} \t ticks = {1}", fList.Average(ts => ts.TotalMilliseconds),
                              f1AverageTicks);
            double f2AverageTicks = f2List.Average(ts => ts.Ticks);
            Console.WriteLine("Elapsed for F2 = {0} \t ticks = {1}", f2List.Average(ts => ts.TotalMilliseconds),
                  f2AverageTicks);
            Console.WriteLine("Not-generic method is {0} times faster, or on {1}%", f1AverageTicks/f2AverageTicks,
                              (f1AverageTicks/f2AverageTicks - 1)*100);
            Console.ReadKey();
        }

        private static void GCClear()
        {
            GC.Collect();
            GC.WaitForPendingFinalizers();
            GC.Collect();
        }
    }
}



Windows和NBSP; 7,.NET 4.5,视觉&NBSP;工作室&NBSP; 2012年,发布,优化的,不附加

Windows 7, .NET 4.5, Visual Studio 2012, release, optimized, without attaching.

64

Elapsed for F = 23.68157         ticks = 236815.7
Elapsed for F2 = 1.701638        ticks = 17016.38
Not-generic method is 13.916925926666 times faster, or on 1291.6925926666%

86

Elapsed for F = 6.713223         ticks = 67132.23
Elapsed for F2 = 6.729897        ticks = 67298.97
Not-generic method is 0.997522398931217 times faster, or on -0.247760106878314%

和我有新的魔法:64快三倍......

And I've got new magic: x64 is three times faster...

PS:我的目标平台64。

PS: My target platform is x64.

推荐答案

我也做了一些修改代码以正确测量PERF的。

I did make some changes to your code to measure perf correctly.


  1. 使用秒表

  2. 执行释放模式

  3. 防止内联。

  4. 使用GetHashCode()方法做一些实际的工作

  5. 查看生成的汇编代码

  1. Use Stopwatch
  2. Execute Release Mode
  3. Prevent Inlining.
  4. Use GetHashCode() to do some real work
  5. Look at the generated Assembly code

下面是代码:

class A
{
}

[MethodImpl(MethodImplOptions.NoInlining)]
static bool F<T>(T a, T b) where T : class
{
    return a.GetHashCode() == b.GetHashCode();
}

[MethodImpl(MethodImplOptions.NoInlining)]
static bool F2(A a, A b)
{
    return a.GetHashCode() == b.GetHashCode();
}

static int Main(string[] args)
{
    const int Runs = 100 * 1000 * 1000;
    var a = new A();
    bool lret = F<A>(a, a);
    var sw = Stopwatch.StartNew();
    for (int i = 0; i < Runs; i++)
    {
        F<A>(a, a);
    }
    sw.Stop();
    Console.WriteLine("Generic: {0:F2}s", sw.Elapsed.TotalSeconds);

    lret = F2(a, a);
    sw = Stopwatch.StartNew();
    for (int i = 0; i < Runs; i++)
    {
        F2(a, a);
    }
    sw.Stop();
    Console.WriteLine("Non Generic: {0:F2}s", sw.Elapsed.TotalSeconds);

    return lret ? 1 : 0;
}



在我的测试中,非通用版本是稍快(.NET 4.5 X32的Windows 7)。
但是,实际上在速度上没有可测量的差异。我想说的都相等。
为了完整这里是通用版的汇编代码:
我通过与JIT优化enabled.The默认释放模式调试器的汇编代码调试期间禁用JIT优化,使设置断点和变量检验更加容易。

During my tests the non generic version was slightly faster (.NET 4.5 x32 Windows 7). But there is practically no measurable difference in speed. I would say the are both equal. For completeness here is the assembly code of the generic version: I got the assembly code via the debugger in Release mode with JIT optimizations enabled.The default is to disable JIT optimizations during debugging to make setting breakpoints and variables inspection easier.

通用

static bool F<T>(T a, T b) where T : class
{
        return a.GetHashCode() == b.GetHashCode();
}

push        ebp 
mov         ebp,esp 
push        ebx 
sub         esp,8 // reserve stack for two locals 
mov         dword ptr [ebp-8],ecx // store first arg on stack
mov         dword ptr [ebp-0Ch],edx // store second arg on stack
mov         ecx,dword ptr [ebp-8] // get first arg from stack --> stupid!
mov         eax,dword ptr [ecx]   // load MT pointer from a instance
mov         eax,dword ptr [eax+28h] // Locate method table start
call        dword ptr [eax+8] //GetHashCode // call GetHashCode function pointer which is the second method starting from the method table
mov         ebx,eax           // store result in ebx
mov         ecx,dword ptr [ebp-0Ch] // get second arg
mov         eax,dword ptr [ecx]     // call method as usual ...
mov         eax,dword ptr [eax+28h] 
call        dword ptr [eax+8] //GetHashCode
cmp         ebx,eax 
sete        al 
movzx       eax,al 
lea         esp,[ebp-4] 
pop         ebx 
pop         ebp 
ret         4 

非通用

static bool F2(A a, A b)
{
  return a.GetHashCode() == b.GetHashCode();
}

push        ebp 
mov         ebp,esp 
push        esi 
push        ebx 
mov         esi,edx 
mov         eax,dword ptr [ecx] 
mov         eax,dword ptr [eax+28h] 
call        dword ptr [eax+8] //GetHashCode
mov         ebx,eax 
mov         ecx,esi 
mov         eax,dword ptr [ecx] 
mov         eax,dword ptr [eax+28h] 
call        dword ptr [eax+8] //GetHashCode
cmp         ebx,eax 
sete        al 
movzx       eax,al 
pop         ebx 
pop         esi 
pop         ebp 
ret 

正如你可以看到通用版本看起来稍微低效由于更多的堆栈memoy操作,这些操作是不完美的,但在现实的差异是不可测量的,因为所有的被装配到这使得相比,非通用版本纯寄存器操作的内存操作成本更低的处理器的L1缓存。我会怀疑非通用版本应该执行在现实世界中更好一点,如果你需要支付没有从任何CPU缓存来真正的内存访问。

As you can see the generic version looks slightly more inefficient due to more stack memoy operations which are not perfect but in reality the difference is not measurable since all is fitting into the L1 cache of the processor which makes the memory operations less costly compared to the pure register operations of the non generic version. I would suspect that the non generic version should perform a little better in real world if you need to pay for real memory access not coming from any CPU cache.

有关所有实际目的,这些两种方法是相同的。你应该看看真实世界的性能提升其他一些地方。我就先来看看数据访问模式和使用的数据结构。算法变化往往会带来更PERF的增益比如此低层次的东西。

For all practical purposes these both methods are identical. You should look at some other place for real world performance gains. I would first look at the data access patterns and used data structures. Algorithmic changes tend to bring much more perf gain than such low level stuff.

EDIT1:如果你想使用==那么你会发现

00000000  push        ebp 
00000001  mov         ebp,esp 
00000003  cmp         ecx,edx // Check for reference equality 
00000005  sete        al 
00000008  movzx       eax,al 
0000000b  pop         ebp 
0000000c  ret         4 

这两种方法产生完全相同的相同的机器代码。你做了测量任何区别是你的测量误差。

both methods produce exactly the same machine code. Any difference you did measure are your measurement errors.

这篇关于通用VS在C#中没有泛型的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆