如何获得准确的绩效衡量标准? [英] How to get an accurate performance measure?

查看:42
本文介绍了如何获得准确的绩效衡量标准?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我们的项目中,我们试图自动监控测试运行的性能,以确保我们的程序性能不会随着时间的推移而发生任何重大变化.

In our project we're trying to automatically monitor the performance of test runs, to make sure that we don't have any significant changes in the performance of the program over time.

问题在于,我们得到的测量值似乎始终存在 5% 的可变性.也就是说,在具有相同程序(没有重新编译)运行相同测试的同一台机器上,我们得到的值在每次运行之间相差大约 5%.这对于我们想要使用这些数字的目的来说太过分了.

The problem is that there seems to be a consistent 5% variability in the measures we get. That is, on the same machine with the same program (no recompilation) running the same test we get values that differ by around 5% from run to run. This is way too much for what we want to use the numbers for.

我们已经从时间考虑中排除了设置成本——也就是说,在 C++ 代码本身中,我们在运行时间关键部分之前和之后立即抓住时间,而不是对整个程序进行计时操作系统级别.我们也在做平均和异常值排除.问题是变异性看起来也有长期趋势,所以我们对重复的时间进行了紧密的聚类,但一两个小时后,时间就大不相同了.(不幸的是,将测试分散到几个小时内是不可行的.)测试也在一台专用机器上运行,而没有其他东西"在上面运行.

We're already excluding setup costs from the timing considerations - that is, from within C++ code itself we're grabbing the time immediately before and after running the time-critical portions, rather than doing the timing of the whole program on the OS level. We are also doing averaging and outlier exclusion. The problem is that the variability looks to also have long-term trends, so we get tight clustering of times for replicates right after each other, but an hour or two later the times are substantially different. (Unfortunately, spreading the test out over several hours is not feasible.) The tests are also being run on a dedicated machine while "nothing else" is being run on it.

我们不太确定时间变化的来源,但这可能与处理器和系统有关 - 有迹象表明变化的大小取决于程序运行的机器.

We're not quite sure where the timing variation is coming from, but it may have to do with the processor and the system - there's indications that the size of the variability depends on what machine the program is running on.

有没有人知道这种变化可能来自哪里,以及如何删除它?测试在专用机器上运行,因此可以更改操作系统设置.

Does anyone have an idea where this variation is likely to be coming from, and how to remove it? The tests are running on a dedicated machine, so changing the operating system settings would be possible.

(如标签所示,这是一个在 x86 Linux 系统上运行的 C++ 程序,如果这有助于澄清事情.)

(As indicated by the tags, this is a C++ program running on a x86 Linux system, if that helps clarify things.)

回复评论

我们当前的计时方案是使用 C 标准库中的 clock() 函数,查看我们要测试的函数之前/之后返回值的差异.

Our current timing scheme is to use the clock() function from the C standard library, looking at the difference in the return value from before/after the functions we want to test.

我们正在测试的代码应该是确定性的,不应涉及大量 IO.

The code we're testing should be deterministic, and shouldn't involve heavy IO.

我意识到情况对于银弹"答案来说有点模糊.我想我更多的是在寻找这些是需要考虑的重要因素,这是您可能应该检查它们的顺序,这是您检查每个因素的方法"类型答案.

I realize that the situation is a little hazy for a "silver bullet" answer. I guess I'm more looking for a "these are the factors that are important to consider, this is the order you probably should check them in, and here's how you go about checking each of them" type answer.

推荐答案

我很惊讶你的变化降到了 5%.

I'm amazed you got down to 5% variation.

除非您可以摆脱系统上运行的所有不必要的东西,否则您将获得很大的变化.这是最高级别的.

Unless you can get rid of all the unnecessary things running on your system, you will be getting high variation. This is at the top level.

您的操作系统需要具有确定性.您需要知道正在运行的其他任务和线程及其持续时间.例如,有时钟中断.现在,有多少其他函数链接到这个中断?这些其他功能是否有所不同?

You OS needs to be deterministic. You need to know what other tasks and threads are running and their durations. For example, there is the clock interrupt. Now, how many other functions are chained to this interrupt? Do these other functions vary?

您的系统是隔离的吗?例如,如果您的系统连接到网络,您的测量结果可能会有所不同.

Is your system isolated? For example, your measurements may vary if your system is connected to a network.

您的程序是否使用外部资源?例如硬盘.如果程序写入硬盘驱动器,驱动器将不是确定性的.文件和部分文件可能会在驱动器上移动.驱动器可能会碎片化.这种碎片可能会导致您的测量结果出现差异.

Does your program use external resources? For example a hard drive. If the program writes to the hard drive, the drive will not be deterministic. Files and parts of files may move on the drive. The drive may become fragmented. This fragmentation may cause variance in your measurements.

操作系统内存可能会碎片化.此外,可执行文件的内存可能会变得碎片化.碎片化可能会增加差异.

The operating system memory may get fragmented. Also, the executable's memory may become fragmented. Fragmentation may add to the variance.

这篇关于如何获得准确的绩效衡量标准?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆