通用VS在C#中没有泛型的性能 [英] Generic vs not-generic performance in C#
问题描述
我已经写了两等效方法:
I've written two equivalent methods:
static bool F<T>(T a, T b) where T : class
{
return a == b;
}
static bool F2(A a, A b)
{
return a == b;
}
时差:
00:00:00.0380022
00:00:00.0170009
Time difference:
00:00:00.0380022
00:00:00.0170009
代码来进行测试:
var a = new A();
for (int i = 0; i < 100000000; i++)
F<A>(a, a);
Console.WriteLine(DateTime.Now - dt);
dt = DateTime.Now;
for (int i = 0; i < 100000000; i++)
F2(a, a);
Console.WriteLine(DateTime.Now - dt);
有谁知道为什么吗?
Does anyone know why?
在下方的评论, DTB * 显示 CIL :
IL for F2: ldarg.0, ldarg.1, ceq, ret. IL for F<T>: ldarg.0, box !!T, ldarg.1, box !!T, ceq, ret.
我认为这是我的问题的答案,但我可以用什么魔法否认拳击?
I think it's the answer for my question, but what magic can I use to deny boxing?
接下来,我用代码的 Psilon
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
namespace ConsoleApplication58
{
internal class Program
{
private class A
{
}
private static bool F<T>(T a, T b) where T : class
{
return a == b;
}
private static bool F2(A a, A b)
{
return a == b;
}
private static void Main()
{
const int rounds = 100, n = 10000000;
var a = new A();
var fList = new List<TimeSpan>();
var f2List = new List<TimeSpan>();
for (int i = 0; i < rounds; i++)
{
// Test generic
GCClear();
bool res;
var sw = new Stopwatch();
sw.Start();
for (int j = 0; j < n; j++)
{
res = F(a, a);
}
sw.Stop();
fList.Add(sw.Elapsed);
// Test not-generic
GCClear();
bool res2;
var sw2 = new Stopwatch();
sw2.Start();
for (int j = 0; j < n; j++)
{
res2 = F2(a, a);
}
sw2.Stop();
f2List.Add(sw2.Elapsed);
}
double f1AverageTicks = fList.Average(ts => ts.Ticks);
Console.WriteLine("Elapsed for F = {0} \t ticks = {1}", fList.Average(ts => ts.TotalMilliseconds),
f1AverageTicks);
double f2AverageTicks = f2List.Average(ts => ts.Ticks);
Console.WriteLine("Elapsed for F2 = {0} \t ticks = {1}", f2List.Average(ts => ts.TotalMilliseconds),
f2AverageTicks);
Console.WriteLine("Not-generic method is {0} times faster, or on {1}%", f1AverageTicks/f2AverageTicks,
(f1AverageTicks/f2AverageTicks - 1)*100);
Console.ReadKey();
}
private static void GCClear()
{
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
}
}
}
Windows和NBSP; 7,.NET 4.5,视觉&NBSP;工作室&NBSP; 2012年,发布,优化的,不附加
Windows 7, .NET 4.5, Visual Studio 2012, release, optimized, without attaching.
64
Elapsed for F = 23.68157 ticks = 236815.7
Elapsed for F2 = 1.701638 ticks = 17016.38
Not-generic method is 13.916925926666 times faster, or on 1291.6925926666%
86
Elapsed for F = 6.713223 ticks = 67132.23
Elapsed for F2 = 6.729897 ticks = 67298.97
Not-generic method is 0.997522398931217 times faster, or on -0.247760106878314%
和我有新的魔法:64快三倍......
And I've got new magic: x64 is three times faster...
PS:我的目标平台64。
PS: My target platform is x64.
推荐答案
我也做了一些修改代码以正确测量PERF的。
I did make some changes to your code to measure perf correctly.
- 使用秒表
- 执行释放模式
- 防止内联。
- 使用GetHashCode()方法做一些实际的工作
- 查看生成的汇编代码
- Use Stopwatch
- Execute Release Mode
- Prevent Inlining.
- Use GetHashCode() to do some real work
- Look at the generated Assembly code
下面是代码:
class A
{
}
[MethodImpl(MethodImplOptions.NoInlining)]
static bool F<T>(T a, T b) where T : class
{
return a.GetHashCode() == b.GetHashCode();
}
[MethodImpl(MethodImplOptions.NoInlining)]
static bool F2(A a, A b)
{
return a.GetHashCode() == b.GetHashCode();
}
static int Main(string[] args)
{
const int Runs = 100 * 1000 * 1000;
var a = new A();
bool lret = F<A>(a, a);
var sw = Stopwatch.StartNew();
for (int i = 0; i < Runs; i++)
{
F<A>(a, a);
}
sw.Stop();
Console.WriteLine("Generic: {0:F2}s", sw.Elapsed.TotalSeconds);
lret = F2(a, a);
sw = Stopwatch.StartNew();
for (int i = 0; i < Runs; i++)
{
F2(a, a);
}
sw.Stop();
Console.WriteLine("Non Generic: {0:F2}s", sw.Elapsed.TotalSeconds);
return lret ? 1 : 0;
}
在我的测试中,非通用版本是稍快(.NET 4.5 X32的Windows 7)。
但是,实际上在速度上没有可测量的差异。我想说的都相等。
为了完整这里是通用版的汇编代码:
我通过与JIT优化enabled.The默认释放模式调试器的汇编代码调试期间禁用JIT优化,使设置断点和变量检验更加容易。
During my tests the non generic version was slightly faster (.NET 4.5 x32 Windows 7). But there is practically no measurable difference in speed. I would say the are both equal. For completeness here is the assembly code of the generic version: I got the assembly code via the debugger in Release mode with JIT optimizations enabled.The default is to disable JIT optimizations during debugging to make setting breakpoints and variables inspection easier.
通用
static bool F<T>(T a, T b) where T : class
{
return a.GetHashCode() == b.GetHashCode();
}
push ebp
mov ebp,esp
push ebx
sub esp,8 // reserve stack for two locals
mov dword ptr [ebp-8],ecx // store first arg on stack
mov dword ptr [ebp-0Ch],edx // store second arg on stack
mov ecx,dword ptr [ebp-8] // get first arg from stack --> stupid!
mov eax,dword ptr [ecx] // load MT pointer from a instance
mov eax,dword ptr [eax+28h] // Locate method table start
call dword ptr [eax+8] //GetHashCode // call GetHashCode function pointer which is the second method starting from the method table
mov ebx,eax // store result in ebx
mov ecx,dword ptr [ebp-0Ch] // get second arg
mov eax,dword ptr [ecx] // call method as usual ...
mov eax,dword ptr [eax+28h]
call dword ptr [eax+8] //GetHashCode
cmp ebx,eax
sete al
movzx eax,al
lea esp,[ebp-4]
pop ebx
pop ebp
ret 4
非通用
static bool F2(A a, A b)
{
return a.GetHashCode() == b.GetHashCode();
}
push ebp
mov ebp,esp
push esi
push ebx
mov esi,edx
mov eax,dword ptr [ecx]
mov eax,dword ptr [eax+28h]
call dword ptr [eax+8] //GetHashCode
mov ebx,eax
mov ecx,esi
mov eax,dword ptr [ecx]
mov eax,dword ptr [eax+28h]
call dword ptr [eax+8] //GetHashCode
cmp ebx,eax
sete al
movzx eax,al
pop ebx
pop esi
pop ebp
ret
正如你可以看到通用版本看起来稍微低效由于更多的堆栈memoy操作,这些操作是不完美的,但在现实的差异是不可测量的,因为所有的被装配到这使得相比,非通用版本纯寄存器操作的内存操作成本更低的处理器的L1缓存。我会怀疑非通用版本应该执行在现实世界中更好一点,如果你需要支付没有从任何CPU缓存来真正的内存访问。
As you can see the generic version looks slightly more inefficient due to more stack memoy operations which are not perfect but in reality the difference is not measurable since all is fitting into the L1 cache of the processor which makes the memory operations less costly compared to the pure register operations of the non generic version. I would suspect that the non generic version should perform a little better in real world if you need to pay for real memory access not coming from any CPU cache.
有关所有实际目的,这些两种方法是相同的。你应该看看真实世界的性能提升其他一些地方。我就先来看看数据访问模式和使用的数据结构。算法变化往往会带来更PERF的增益比如此低层次的东西。
For all practical purposes these both methods are identical. You should look at some other place for real world performance gains. I would first look at the data access patterns and used data structures. Algorithmic changes tend to bring much more perf gain than such low level stuff.
EDIT1:如果你想使用==那么你会发现
00000000 push ebp
00000001 mov ebp,esp
00000003 cmp ecx,edx // Check for reference equality
00000005 sete al
00000008 movzx eax,al
0000000b pop ebp
0000000c ret 4
这两种方法产生完全相同的相同的机器代码。你做了测量任何区别是你的测量误差。
both methods produce exactly the same machine code. Any difference you did measure are your measurement errors.
这篇关于通用VS在C#中没有泛型的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!