如何从 C++ 获取 x86_64 中的 CPU 周期数? [英] How to get the CPU cycle count in x86_64 from C++?

查看:36
本文介绍了如何从 C++ 获取 x86_64 中的 CPU 周期数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 SO 上看到了这篇文章,其中包含用于获取最新 CPU 周期计数的 C 代码:

I saw this post on SO which contains C code to get the latest CPU Cycle count:

C/C++ 中基于 CPU 周期计数的分析Linux x86_64

有什么方法可以在 C++ 中使用此代码(欢迎使用 windows 和 linux 解决方案)?尽管用 C 编写(并且 C 是 C++ 的子集),但我不太确定这段代码是否可以在 C++ 项目中运行,如果不能,如何翻译它?

Is there a way I can use this code in C++ (windows and linux solutions welcome)? Although written in C (and C being a subset of C++) I am not too certain if this code would work in a C++ project and if not, how to translate it?

我使用的是 x86-64

I am using x86-64

编辑 2:

找到了这个函数,但是无法让VS2010识别汇编程序.我需要包括任何东西吗?(我相信我必须将 uint64_t 换成 long long 以用于 windows ......?)

Found this function but cannot get VS2010 to recognise the assembler. Do I need to include anything? (I believe I have to swap uint64_t to long long for windows....?)

static inline uint64_t get_cycles()
{
  uint64_t t;
  __asm volatile ("rdtsc" : "=A"(t));
  return t;
}

编辑 3:

从上面的代码我得到错误:

From above code I get the error:

"错误 C2400:'opcode' 中的内联汇编语法错误;找到了 'data输入'"

"error C2400: inline assembler syntax error in 'opcode'; found 'data type'"

有人可以帮忙吗?

推荐答案

从 GCC 4.5 及更高版本开始,__rdtsc() 内在现在被 MSVC 和海湾合作委员会.

Starting from GCC 4.5 and later, the __rdtsc() intrinsic is now supported by both MSVC and GCC.

但是需要的包含是不同的:

But the include that's needed is different:

#ifdef _WIN32
#include <intrin.h>
#else
#include <x86intrin.h>
#endif

<小时>

这是 GCC 4.5 之前的原始答案.


Here's the original answer before GCC 4.5.

直接从我的一个项目中拉出来:

Pulled directly out of one of my projects:

#include <stdint.h>

//  Windows
#ifdef _WIN32

#include <intrin.h>
uint64_t rdtsc(){
    return __rdtsc();
}

//  Linux/GCC
#else

uint64_t rdtsc(){
    unsigned int lo,hi;
    __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
    return ((uint64_t)hi << 32) | lo;
}

#endif

这个 GNU C Extended asm 告诉编译器:

  • volatile:输出不是输入的纯函数(因此每次都必须重新运行,而不是重复使用旧结果).
  • "=a"(lo)"=d"(hi) :输出操作数是固定寄存器:EAX 和 EDX.(x86 机器约束).x86 rdtsc 指令将其 64 位结果放在 EDX:EAX 中,因此让编译器选择带有 "=r" 的输出是行不通的:没有办法要求 CPU 将结果转移到其他任何地方.
  • ((uint64_t)hi <<<32) |lo - 将 32 位的一半都零扩展到 64 位(因为 lo 和 hi 是 unsigned),然后将它们逻辑移位 + 或一起转换为单个 64 位 C 变量.在 32 位代码中,这只是一种重新解释;这些值仍然只保留在一对 32 位寄存器中.在 64 位代码中,您通常会得到一个实际的 shift + OR asm 指令,除非高半部分被优化掉.
  • volatile: the outputs aren't a pure function of the inputs (so it has to re-run every time, not reuse an old result).
  • "=a"(lo) and "=d"(hi) : the output operands are fixed registers: EAX and EDX. (x86 machine constraints). The x86 rdtsc instruction puts its 64-bit result in EDX:EAX, so letting the compiler pick an output with "=r" wouldn't work: there's no way to ask the CPU for the result to go anywhere else.
  • ((uint64_t)hi << 32) | lo - zero-extend both 32-bit halves to 64-bit (because lo and hi are unsigned), and logically shift + OR them together into a single 64-bit C variable. In 32-bit code, this is just a reinterpretation; the values still just stay in a pair of 32-bit registers. In 64-bit code you typically get an actual shift + OR asm instructions, unless the high half optimizes away.

(编者注:如果您使用 unsigned long 而不是 unsigned int,这可能会更有效.然后编译器就会知道 lo> 已经零扩展到 RAX.它不会知道上半部分是零,所以 |+ 是等效的,如果它想以不同的方式合并.理论上,内在函数应该为您提供两全其美的效果,让优化器做得很好.)

(editor's note: this could probably be more efficient if you used unsigned long instead of unsigned int. Then the compiler would know that lo was already zero-extended into RAX. It wouldn't know that the upper half was zero, so | and + are equivalent if it wanted to merge a different way. The intrinsic should in theory give you the best of both worlds as far as letting the optimizer do a good job.)

https://gcc.gnu.org/wiki/DontUseInlineAsm 如果可以避免的话.但如果您需要了解使用内联 asm 的旧代码,那么希望本节很有用,以便您可以使用内在函数重写它.另请参阅 https://stackoverflow.com/tags/inline-assembly/info

https://gcc.gnu.org/wiki/DontUseInlineAsm if you can avoid it. But hopefully this section is useful if you need to understand old code that uses inline asm so you can rewrite it with intrinsics. See also https://stackoverflow.com/tags/inline-assembly/info

这篇关于如何从 C++ 获取 x86_64 中的 CPU 周期数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆