在进行性能测试时,为什么初始迭代总是比平均速度慢? [英] When doing performance testing, why are the initial iterations constantly slower than the average?
问题描述
似乎每次运行性能测试时,在前几次迭代中总会有一个结束"时间,直到时间稳定为止.
It seems like every time I run performance tests, there is always a "wind down" time in the first few iterations before times stabilize.
这是性能测试代码(在本例中,我正在测试Lambda和LINQ之间的区别):
Here's the performance testing code (in this case, I was testing the difference between Lambda and LINQ):
using System;
using System.Collections.Generic;
using System.Diagnostics;
namespace Sandbox
{
public class Program
{
private static long sum = 0;
private static int count = 0;
public class Item
{
public string name;
public int id;
}
public static void Main(string[] args)
{
// START TESTING PARAMETERS
List<Item> items = new List<Item>();
for (int i = 0; i < 1000; i++)
{
items.Add(new Item
{
id = i,
name = "name_" + i.ToString()
});
}
// END TESTING PARAMETERS
Stopwatch sw = new Stopwatch();
sw.Start();
for (int j = 0; j < 10; j++)
{
for (int i = 0; i < 5000; i++)
{
// START TESTING CODE
Item itm = items.Find(x => x.name == "name_" + i.ToString());
// END TESTING CODE
}
sum += sw.ElapsedMilliseconds;
count++;
sw.Restart();
Console.WriteLine("Average: {0}", sum / count);
}
}
}
}
这是100,000次测试运行的5次迭代的平均结果:
And here are the average results of 5 iterations of 100,000 test runs:
Average: 1023 Average: 1079 Average: 1017 Average: 1147 Average: 1054
Average: 1003 Average: 963 Average: 1001 Average: 1007 Average: 1020
Average: 1009 Average: 926 Average: 951 Average: 958 Average: 966
Average: 972 Average: 908 Average: 927 Average: 934 Average: 936
Average: 946 Average: 896 Average: 922 Average: 919 Average: 918
Average: 931 Average: 889 Average: 926 Average: 910 Average: 907
Average: 919 Average: 883 Average: 916 Average: 903 Average: 899
Average: 911 Average: 880 Average: 908 Average: 898 Average: 893
Average: 904 Average: 877 Average: 902 Average: 894 Average: 899
Average: 899 Average: 874 Average: 909 Average: 891 Average: 894
Average: 895 Average: 873 Average: 926 Average: 889 Average: 890
Average: 898 Average: 871 Average: 937 Average: 886 Average: 887
Average: 898 Average: 869 Average: 944 Average: 884 Average: 907
Average: 894 Average: 868 Average: 938 Average: 882 Average: 921
Average: 891 Average: 868 Average: 934 Average: 881 Average: 923
Average: 889 Average: 867 Average: 929 Average: 880 Average: 919
Average: 887 Average: 866 Average: 925 Average: 884 Average: 916
Average: 885 Average: 866 Average: 931 Average: 892 Average: 912
Average: 889 Average: 865 Average: 927 Average: 902 Average: 909
Average: 891 Average: 870 Average: 924 Average: 907 Average: 917
为什么我每次进行测试都会有一段休假期?
Any reason why each time I do testing, there is a wind down period?
推荐答案
您想看看埃里克·利珀特的性能测试系列
错误#6:在测量时将第一次运行视为没有什么特别的 平均表现.
Mistake #6: Treat the first run as nothing special when measuring average performance.
为了在当今世界的基准测试中获得良好的结果 由于添加了代码和加载代码,可能会导致昂贵的启动成本 库和调用静态构造函数,您必须应用一些 仔细考虑一下您实际要测量的内容.
In order to get a good result out of a benchmark test in a world with potentially expensive startup costs due to jitting code, loading libraries and calling static constructors, you've got to apply some careful thought about what you're actually measuring.
例如,如果您出于特定目的进行基准测试, 分析启动成本,那么您将要确保 您只测量第一轮.另一方面,如果您是 将要运行数百万个服务的服务的基准测试部分 很多天的时间,您希望知道平均时间 采取典型用法,那么第一次运行的高成本是 无关紧要的,因此不应成为平均值的一部分.不管你 在您的时间安排中是否进行第一次跑步取决于您自己;我的观点 是,您需要意识到以下事实: 成本可能与第二种成本大不相同.
If, for example, you are benchmarking for the specific purpose of analyzing startup costs then you're going to want to make sure that you measure only the first run. If on the other hand you are benchmarking part of a service that is going to be running millions of times over many days and you wish to know the average time that will be taken in a typical usage then the high cost of the first run is irrelevant and therefore shouldn't be part of the average. Whether you include the first run in your timings or not is up to you; my point is, you need to be cognizant of the fact that the first run has potentially very different costs than the second.
...
此外,重要的是要注意,不同的抖动会产生不同的 在不同的计算机和.NET的不同版本中得到的结果 框架.每次打针所花费的时间可能会相差很大,数量也可能相差很大 机器代码中生成的优化结果. jit编译器 Windows 32位桌面,Windows 64位桌面,Silverlight 在Mac上运行,并且当您遇到 XBOX 360上的XNA中的C#程序都可能存在差异 性能特点.
Moreover, it's important to note that different jitters give different results on different machines and in different versions of the .NET framework. The time taken to jit can vary greatly, as can the amount of optimization generated in the machine code. The jit compilers on the Windows 32 bit desktop, Windows 64 bit desktop, Silverlight running on a Mac, and the "compact" jitter that runs when you have a C# program in XNA on XBOX 360 all have potentially different performance characteristics.
简而言之,准时制很昂贵.除非您要这样做,否则不应将其纳入测试范围.这取决于典型用法.如果您的代码要一次启动并长时间运行,则放弃第一个测试,但是如果大多数情况下将要启动和停止,那么第一个测试将很重要.
In short JIT'ing is expensive. You shouldn't factor it into your tests unless that is what you want. It depends on typical usage. If your code is going to startup once and stay up for long periods, then discard the first tests, but if it is mostly going to be start and stops, then the first test will be important.
这篇关于在进行性能测试时,为什么初始迭代总是比平均速度慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!