寻找一个精确的方式微基准小代码路径用C ++编写和运行在Linux / OSX [英] Looking for an accurate way to micro benchmark small code paths written in C++ and running on Linux/OSX

查看:191
本文介绍了寻找一个精确的方式微基准小代码路径用C ++编写和运行在Linux / OSX的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一些非常基本的微基准的小代码路径,如紧凑循环,我用C ++编写。我在Linux和OSX上运行,并使用GCC。什么设施有亚毫秒的准确性?我想一个简单的运行代码路径测试多次(几千万?)将给我足够的一致性,以获得良好的阅读。

I'm looking to do some very basic micro benchmarking of small code paths, such as tight loops, that I've written in C++. I'm running on Linux and OSX, and using GCC. What facilities are there for sub millisecond accuracy? I am thinking a simple test of running the code path many times (several tens of millions?) will give me enough consistency to get a good reading. If anyone knows of preferable methods, please feel free to suggest them.

推荐答案

您可以使用rdtsc x86 / x86_64上的处理器指令。对于多核系统,检查CPUID(linux中的/ proc / cpuinfo)中的constant_tsc功能 - 这意味着所有内核都使用相同的刻度计数器,即使动态频率更改和休眠也是如此。

You can use "rdtsc" processor instruction on x86/x86_64. For multicore systems check the "constant_tsc" capability in CPUID (/proc/cpuinfo in linux) - it will mean that all cores uses the same tick counter, even with dynamic freq changing and sleeping.

如果你的处理器不支持constant_tsc,一定要绑定你的程序到核心( taskset 在Linux中的实用程序)。

If you processor does not support constant_tsc, be sure to bind you programm to the core (taskset utility in Linux).

在乱序CPU上使用rdtsc时(除了Intel Atom以外,可能还有其他低端cpus),在之前添加一个ordering指令,例如cpuid - 它将临时禁用指令重新排序。

When using rdtsc on out-of-order CPUs (All besides Intel Atom, may be some other low-end cpus), add an "ordering" instruction before, e.g. "cpuid" - it will temporary disable instruction reordering.

此外,MacOsX还有Shark可以测量代码中的一些硬件事件。

Also, MacOsX have "Shark" which can measure some hardware events in your code.

RDTSC和无序cpus。这个伟大的Fog手册的第18节(主要网站是 http://www.agner.org/optimize/

RDTSC and out-of-order cpus. Section 18 of this great Fog's manual ( main site of it is http://www.agner.org/optimize/ )

http: //www.scribd.com/doc/1548519/optimizing-assembly


在所有无序的处理器执行,您必须在每次读取计数器之前和之后插入XOR EAX,EAX / CPUID
,以防止它与其他任何并行执行
。 CPUID是一个序列化指令,这意味着它刷新
管道并等待所有挂起的操作完成,然后再继续。这是非常有用的
用于测试目的。

On all processors with out-of-order execution, you have to insert XOR EAX,EAX / CPUID before and after each read of the counter in order to prevent it from executing in parallel with anything else. CPUID is a serializing instruction, which means that it flushes the pipeline and waits for all pending operations to finish before proceeding. This is very useful for testing purposes.

这篇关于寻找一个精确的方式微基准小代码路径用C ++编写和运行在Linux / OSX的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆