C#for循环和Array.Fill之间的性能差异 [英] Performance difference between C# for-loop and Array.Fill
问题描述
我已经使用 BenchmarkDotNet
实施了以下基准测试:
I have implemented the following benchmark using BenchmarkDotNet
:
public class ForVsFillVsEnumerable
{
private bool[] data;
[Params(10, 100, 1000)]
public int N;
[GlobalSetup]
public void Setup()
{
data = new bool[N];
}
[Benchmark]
public void Fill()
{
Array.Fill(data, true);
}
[Benchmark]
public void For()
{
for (int i = 0; i < data.Length; i++)
{
data[i] = true;
}
}
[Benchmark]
public void EnumerableRepeat()
{
data = Enumerable.Repeat(true, N).ToArray();
}
}
结果是:
BenchmarkDotNet=v0.11.3, OS=Windows 10.0.17763.195 (1809/October2018Update/Redstone5)
Intel Core i7-8700K CPU 3.70GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=2.2.200-preview-009648
[Host] : .NET Core 2.2.0 (CoreCLR 4.6.27110.04, CoreFX 4.6.27110.04), 64bit RyuJIT
Core : .NET Core 2.2.0 (CoreCLR 4.6.27110.04, CoreFX 4.6.27110.04), 64bit RyuJIT
Job=Core Runtime=Core
Method | N | Mean | Error | StdDev | Median | Ratio | Rank |
----------------- |----- |-----------:|-----------:|------------:|-----------:|------:|-----:|
Fill | 10 | 3.675 ns | 0.2550 ns | 0.7150 ns | 3.331 ns | 1.00 | 1 |
| | | | | | | |
For | 10 | 6.615 ns | 0.3928 ns | 1.1581 ns | 6.056 ns | 1.00 | 1 |
| | | | | | | |
EnumerableRepeat | 10 | 25.388 ns | 1.0451 ns | 2.9307 ns | 24.170 ns | 1.00 | 1 |
| | | | | | | |
Fill | 100 | 50.557 ns | 2.0766 ns | 6.1229 ns | 46.690 ns | 1.00 | 1 |
| | | | | | | |
For | 100 | 64.330 ns | 4.0058 ns | 11.8111 ns | 59.442 ns | 1.00 | 1 |
| | | | | | | |
EnumerableRepeat | 100 | 81.784 ns | 4.2407 ns | 12.5039 ns | 75.937 ns | 1.00 | 1 |
| | | | | | | |
Fill | 1000 | 447.016 ns | 15.4420 ns | 45.5312 ns | 420.239 ns | 1.00 | 1 |
| | | | | | | |
For | 1000 | 589.243 ns | 51.3450 ns | 151.3917 ns | 495.177 ns | 1.00 | 1 |
| | | | | | | |
EnumerableRepeat | 1000 | 519.124 ns | 21.3580 ns | 62.9746 ns | 505.573 ns | 1.00 | 1 |
最初我猜想 Array.Fill
确实可以一些优化使其在循环中的性能优于
循环,但随后我检查了。 NET Core源代码,可以看到 Array.Fill
实现非常简单:
Originally I guessed the Array.Fill
does some optimizations which make it perform better than for
-loop, but then I checked the .NET Core source code to see that the Array.Fill
implementation is pretty straightforward:
public static void Fill<T>(T[] array, T value)
{
if (array == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.array);
}
for (int i = 0; i < array.Length; i++)
{
array[i] = value;
}
}
性能足够接近,但仍然显得 Fill
始终比 for
快一点,尽管在后台它是完全相同的代码。你能解释为什么吗?还是我只是错误地读取了结果?
The performance is close enough, but it stills seems Fill
is consistently a bit faster then for
even though under the hood it is exactly the same code. Can you explain why? Or am I just reading the results incorrectly?
推荐答案
我对 Enumerable.Repeat()感到惊讶
,与我的第一个想法相反,它的伸缩性很好。无论如何,要回答您的问题:使用 For()
时,在调用 Array.Fill()$ c时会重复访问类成员。 $ c>您只获得一次它的地址。
I'm surprised by Enumerable.Repeat()
, contrarily to my first thought it scales pretty well. Anyway, to answer your question: when you use For()
you repeatedly access a class member while when calling Array.Fill()
you obtain its address just once.
令我惊讶的是,编译器没有检测到并进行优化,而是读取类成员的值。您需要 ldarg.0
来获取 this
的值,然后需要 ldfld ForVsFillVsEnumerable.data
以获得其实际地址。在 ForVsFillVsEnumerable.Fill()
中,只需调用一次 Array.Fill()
。
I'm even more surprised that compiler does not detect - and optimize - this but to read the value of a class member you need ldarg.0
to get the value of this
and then ldfld ForVsFillVsEnumerable.data
to obtain its actual address. In ForVsFillVsEnumerable.Fill()
this is done just once to call Array.Fill()
.
您可以检查编写自己的填充函数:
You can check this writing your own fill function:
[Benchmark]
public void For2()
{
ForImpl(data);
}
private static void ForImpl(bool[] data)
{
for (int i = 0; i < data.Length; i++)
{
data[i] = true;
}
}
注1:无论性能如何,都可以使用库函数总是更好,因为它可能会在将来的优化中受益(例如,他们可能会决定为 Array.Fill()
添加特定的重载,并使用本机代码来实现它们-对于某些架构-普通的 memset()
非常快)。
Note 1: regardless performance, to use a library function is always better because it can potentially benefit of future optimizations (they may decide, for example, to add specific overloads for Array.Fill()
and to implement them with native code where - for some architectures - a plain memset()
is extremely fast).
注2:如果循环代码为这么小(又快),我会避免用小矢量(10或100项)来测量任何东西,因为要设置合适的测试环境以可靠地测量几纳秒的差是极其困难的。我认为一开始的最低要求是1000(甚至100,000)(即使在这种情况下,很多其他事情也将起着重要的作用...)除非您的实际用例是10/100 ...在那种情况下,我会尝试测量一个更大的算法,这种差异更加明显(如果不是,那么您就不必在意)。
Note 2: if loop code is so small (and fast) I'd avoid to measure anything with small vectors (10 or 100 items) because it's extremely difficult to setup a proper test environment to reliably measure a difference of few nanoseconds. I'd consider 1000 (or even 100,000) the very minimum to begin with (and even in that case so many other things will play a relevant role...) Unless your real-world use case is 10/100...in that case I'd try to measure a bigger algorithm where this difference is more evident (and if it's not then you shouldn't care).
这篇关于C#for循环和Array.Fill之间的性能差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!