如何从C ++获取x86_64中的CPU周期计数? [英] How to get the CPU cycle count in x86_64 from C++?

查看:376
本文介绍了如何从C ++获取x86_64中的CPU周期计数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在SO上看到了此帖子,其中包含C代码以获取最新的CPU周期计数:



在C / C ++ Linux x86_64中基于CPU周期计数的分析



有没有一种方法可以在C ++中使用此代码(欢迎使用Windows和Linux解决方案)?尽管是用C语言编写的(并且C是C ++的子集),但我不太确定此代码是否可以在C ++项目中使用,否则,如何翻译?



我正在使用x86-64



EDIT2:



找到了此功能,但无法使VS2010识别汇编器。我需要包括什么吗? (我相信我必须将 uint64_t 换成 long long for Windows...。?)

 静态内联uint64_t get_cycles()
{
uint64_t t;
__asm volatile( rdtsc: = A(t));
返回t;
}

EDIT3:



从上面的代码中,我得到了错误:


错误C2400:'opcode'中的内联汇编语法错误;找到了'data
type'


有人可以帮忙吗?

解决方案

从GCC 4.5及更高版本开始,MSVC和GCC现在都支持 __ rdtsc()内在 p>

但所需的包含条件有所不同:

  #ifdef _WIN32 
#include< intrin.h>
#else
#include< x86intrin.h>
#endif






这是原始答案在GCC 4.5之前。



直接从我的项目之一中拉出:

  #include< stdint.h> 

// Windows
#ifdef _WIN32

#include< intrin.h>
uint64_t rdtsc(){
return __rdtsc();
}

// Linux / GCC
#else

uint64_t rdtsc(){
unsigned int lo,hi;
__asm__ __volatile__( rdtsc: = a(lo), = d(hi));
return((uint64_t)hi<< 32)| ;
}

#endif

GNU C扩展asm 告诉编译器:




  • volatile :输出不是输入的纯函数(因此它必须每次都重新运行,而不是重复使用旧结果)。

  • = a(lo) = d(hi):输出操作数是固定寄存器:EAX和EDX。 ( x86机器约束)。 x86 rdtsc 指令将其64位结果放入EDX:EAX中,因此让编译器选择 = r 不起作用:无法要求CPU将结果传到其他地方。

  • (((uint64_t)hi<< 32 )| lo -将32位半部分零扩展到64位(因为lo和hi是 unsigned ),并在逻辑上将+或+或在一起转换为单个64位C变量。在32位代码中,这只是重新解释。这些值仍然只保留在一对32位寄存器中。在64位代码中,您通常会获得实际的shift + OR asm指令,除非上半部分已优化掉。



(编辑注意:如果使用 unsigned long 而不是 unsigned int ,这可能会更有效。然后编译器会知道 lo 已经被零扩展到RAX中。不知道上半部分为零,所以 | + 是等效的,如果它想以不同的方式合并。从理论上讲,内在函数应该让您两全其美,只要让优化器做得好。)



https://gcc.gnu.org/wiki/DontUseInlineAsm (如果可以避免的话)。但希望本节对您需要了解使用内联汇编的旧代码,以便可以使用内部函数重写它很有用。另请参见 https://stackoverflow.com/tags/inline-assembly/info


I saw this post on SO which contains C code to get the latest CPU Cycle count:

CPU Cycle count based profiling in C/C++ Linux x86_64

Is there a way I can use this code in C++ (windows and linux solutions welcome)? Although written in C (and C being a subset of C++) I am not too certain if this code would work in a C++ project and if not, how to translate it?

I am using x86-64

EDIT2:

Found this function but cannot get VS2010 to recognise the assembler. Do I need to include anything? (I believe I have to swap uint64_t to long long for windows....?)

static inline uint64_t get_cycles()
{
  uint64_t t;
  __asm volatile ("rdtsc" : "=A"(t));
  return t;
}

EDIT3:

From above code I get the error:

"error C2400: inline assembler syntax error in 'opcode'; found 'data type'"

Could someone please help?

解决方案

Starting from GCC 4.5 and later, the __rdtsc() intrinsic is now supported by both MSVC and GCC.

But the include that's needed is different:

#ifdef _WIN32
#include <intrin.h>
#else
#include <x86intrin.h>
#endif


Here's the original answer before GCC 4.5.

Pulled directly out of one of my projects:

#include <stdint.h>

//  Windows
#ifdef _WIN32

#include <intrin.h>
uint64_t rdtsc(){
    return __rdtsc();
}

//  Linux/GCC
#else

uint64_t rdtsc(){
    unsigned int lo,hi;
    __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
    return ((uint64_t)hi << 32) | lo;
}

#endif

This GNU C Extended asm tells the compiler:

  • volatile: the outputs aren't a pure function of the inputs (so it has to re-run every time, not reuse an old result).
  • "=a"(lo) and "=d"(hi) : the output operands are fixed registers: EAX and EDX. (x86 machine constraints). The x86 rdtsc instruction puts its 64-bit result in EDX:EAX, so letting the compiler pick an output with "=r" wouldn't work: there's no way to ask the CPU for the result to go anywhere else.
  • ((uint64_t)hi << 32) | lo - zero-extend both 32-bit halves to 64-bit (because lo and hi are unsigned), and logically shift + OR them together into a single 64-bit C variable. In 32-bit code, this is just a reinterpretation; the values still just stay in a pair of 32-bit registers. In 64-bit code you typically get an actual shift + OR asm instructions, unless the high half optimizes away.

(editor's note: this could probably be more efficient if you used unsigned long instead of unsigned int. Then the compiler would know that lo was already zero-extended into RAX. It wouldn't know that the upper half was zero, so | and + are equivalent if it wanted to merge a different way. The intrinsic should in theory give you the best of both worlds as far as letting the optimizer do a good job.)

https://gcc.gnu.org/wiki/DontUseInlineAsm if you can avoid it. But hopefully this section is useful if you need to understand old code that uses inline asm so you can rewrite it with intrinsics. See also https://stackoverflow.com/tags/inline-assembly/info

这篇关于如何从C ++获取x86_64中的CPU周期计数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆