如何准确地对C/C ++代码进行基准测试? [英] How to do benchmarking for C/C++ code accurately?

查看：108 发布时间：2020/9/20 18:57:07 c++ c benchmarking

本文介绍了如何准确地对C/C ++代码进行基准测试?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我要问有关此问题的答案，在我的回答中，我首先得到了之前和之后的时间.循环并打印出它们之间的差异，但是作为@cigien答案的更新，似乎我没有通过预热代码来不正确地进行基准测试.

I'm asking regarding answers on this question, In my answer I first just got the time before and after the loops and printed out their difference, But as an update for @cigiens answer, it seems that I've done benchmarking inaccurately by not warming up the code.

什么是代码预热?我认为这里发生的事情是该字符串首先移到了缓存中，并使随后的循环的基准测试结果彼此接近.在我的旧答案中，第一个基准测试结果比其他基准测试结果要慢，因为将字符串移动到缓存中花费了更多时间，我认为，我正确吗?如果不是，那么实际上是对代码进行预热，如果可能的话，一般来说，除了预热以获得更准确的结果外，我还应该做什么?或如何为C ++代码正确进行基准测试(如果可能也为C)?

What is warming up of the code? I think what happened here is that the string was moved to the cache first and that made the benchmarking results for the following loops close to each other. In my old answer, the first benchmarking result was slower than others, since it took more time to move the string to the cache I think, Am I correct? If not, what is warming up actually doing to code and also generally speaking if possible, What should I've done else than warming up for more accurate results? or how to do benchmarking correctly for C++ code (also C if possibly the same)?

推荐答案

为了给您一个热身的例子，我最近对一些nvidia cuda内核调用进行了基准测试:

To give you an example of warm up, i've recently benchmarked some nvidia cuda kernel calls:

执行速度似乎随着时间的推移而增加，可能是由于多种原因，例如GPU频率是可变的(以节省功率和冷却时间).

The execution speed seems to increase over time, probably for several reasons like the fact that the GPU frequency is variable (to save power and cooldown).

有时通话速度较慢会对下一次通话产生更严重的影响，因此基准测试可能会产生误导.

Sometimes the slower call has an even worse impact on the next call so the benchmark can be misleading.

如果您需要对这些要点感到放心，我建议您:

If you need to feel safe about these points, I advice you to:

首先保留所有动态内存(如向量)
进行for循环以在测量之前多次执行相同的工作
这意味着在循环之前仅初始化一次输入数据(尤其是随机数据)，并在循环内每次都复制它们，以确保您执行相同的工作
如果您使用高速缓存处理复杂的对象，我建议您将它们打包在一个结构中，并为此结构构造一个数组(使用相同的构造或克隆技术)，以确保在循环中相同的起始数据
如果您经常替换两次调用，并且假设行为差异的影响会相互抵消，则可以避免进行for循环并复制数据，例如在模拟连续数据(例如位置)时

关于测量工具，我一直在不同机器上遇到high_resolution_clock的问题，例如持续时间的不一致.相反，Windows的QueryPerformanceCounter很好.

concerning the measurement tools, i've always faced problems with high_resolution_clock on different machines, like the non consistency of the durations. On the contrary, the windows QueryPerformanceCounter is very good.

我希望能对您有所帮助！

I hope that helps !

编辑

我忘了像评论中所说的那样有效地添加，编译器优化行为可能令人讨厌.我发现的最简单的方法是根据预热和测量数据中的一些非平凡操作来增加变量，以尽可能强制执行顺序计算.

I forgot to add that effectively as said in the comments, the compiler optimization behavior can be annoying to deal with. The simplest way i've found is to increment a variable depending on some non trivial operations from both the warm up and the measured datas, in order to force the sequential computation as much as possible.

这篇关于如何准确地对C/C ++代码进行基准测试?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何准确地对C/C ++代码进行基准测试? [英] How to do benchmarking for C/C++ code accurately?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

如何准确地对C/C ++代码进行基准测试? [英] How to do benchmarking for C/C++ code accurately?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭