寻找一个精确的方式微基准小代码路径用C ++编写和运行在Linux / OSX [英] Looking for an accurate way to micro benchmark small code paths written in C++ and running on Linux/OSX

查看：191 发布时间：2016/10/23 20:42:31 c++ linux performance benchmarking

本文介绍了寻找一个精确的方式微基准小代码路径用C ++编写和运行在Linux / OSX的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找一些非常基本的微基准的小代码路径，如紧凑循环，我用C ++编写。我在Linux和OSX上运行，并使用GCC。什么设施有亚毫秒的准确性？我想一个简单的运行代码路径测试多次（几千万？）将给我足够的一致性，以获得良好的阅读。

I'm looking to do some very basic micro benchmarking of small code paths, such as tight loops, that I've written in C++. I'm running on Linux and OSX, and using GCC. What facilities are there for sub millisecond accuracy? I am thinking a simple test of running the code path many times (several tens of millions?) will give me enough consistency to get a good reading. If anyone knows of preferable methods, please feel free to suggest them.

推荐答案

您可以使用rdtsc x86 / x86_64上的处理器指令。对于多核系统，检查CPUID（linux中的/ proc / cpuinfo）中的constant_tsc功能 - 这意味着所有内核都使用相同的刻度计数器，即使动态频率更改和休眠也是如此。

You can use "rdtsc" processor instruction on x86/x86_64. For multicore systems check the "constant_tsc" capability in CPUID (/proc/cpuinfo in linux) - it will mean that all cores uses the same tick counter, even with dynamic freq changing and sleeping.

如果你的处理器不支持constant_tsc，一定要绑定你的程序到核心（ taskset 在Linux中的实用程序）。

If you processor does not support constant_tsc, be sure to bind you programm to the core (taskset utility in Linux).

在乱序CPU上使用rdtsc时（除了Intel Atom以外，可能还有其他低端cpus），在之前添加一个ordering指令，例如cpuid - 它将临时禁用指令重新排序。

When using rdtsc on out-of-order CPUs (All besides Intel Atom, may be some other low-end cpus), add an "ordering" instruction before, e.g. "cpuid" - it will temporary disable instruction reordering.

此外，MacOsX还有Shark可以测量代码中的一些硬件事件。

Also, MacOsX have "Shark" which can measure some hardware events in your code.

RDTSC和无序cpus。这个伟大的Fog手册的第18节（主要网站是 http://www.agner.org/optimize/ ）

RDTSC and out-of-order cpus. Section 18 of this great Fog's manual ( main site of it is http://www.agner.org/optimize/ )

http： //www.scribd.com/doc/1548519/optimizing-assembly

在所有无序的处理器执行，您必须在每次读取计数器之前和之后插入XOR EAX，EAX / CPUID
，以防止它与其他任何并行执行
。 CPUID是一个序列化指令，这意味着它刷新
管道并等待所有挂起的操作完成，然后再继续。这是非常有用的
用于测试目的。

On all processors with out-of-order execution, you have to insert XOR EAX,EAX / CPUID before and after each read of the counter in order to prevent it from executing in parallel with anything else. CPUID is a serializing instruction, which means that it flushes the pipeline and waits for all pending operations to finish before proceeding. This is very useful for testing purposes.

这篇关于寻找一个精确的方式微基准小代码路径用C ++编写和运行在Linux / OSX的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

寻找一个精确的方式微基准小代码路径用C ++编写和运行在Linux / OSX [英] Looking for an accurate way to micro benchmark small code paths written in C++ and running on Linux/OSX

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

寻找一个精确的方式微基准小代码路径用C ++编写和运行在Linux / OSX [英] Looking for an accurate way to micro benchmark small code paths written in C++ and running on Linux/OSX

问题描述

推荐答案

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭