如何在C编程中使用rdtsc估算for循环的开销 [英] how to estimate the overhead of for loop using rdtsc in c programming

查看:73
本文介绍了如何在C编程中使用rdtsc估算for循环的开销的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过在0到7的范围内增加参数来计算功能参数的开销.如何估算硬件开销和软件开销.

I want to calculate the overhead of the parameters of a fucntion with increasing the parameters over a range of 0 to 7 . How to estimate the hardware overhead and software overhead .

推荐答案

您的问题提出的并不十分正确.但是,执行 rdtsc 指令的最可靠方法是仅通过内联汇编调用它,所有C编译器都完全支持内联汇编.C标准规定的任何计时功能都会因实现方式而有所不同.英特尔在实现 rdtsc 东西

Your question isn't really well posed. However, the most reliable way to execute the rdtsc instruction is to just call it with inline assembly, which is fully supported by all C compilers. Any timing function prescribed by a C standard will vary by implementation. Intel has a really good white paper on the best way to implement rdtsc stuff here. The major concern is out-of-order execution, which may be out of the scope of your question.

我找到的最佳实现是在此存储库中,我已对其进行了调整自己使用.假设您有兼容的处理器,这组基本宏将为您带来每次调用约32个时钟滴答的开销(您需要对自己的处理器进行测试):

The best implementation I've found is in this repo, which I've adapted for my own use. This basic set of macros, assuming you have a compatible processor, will give you ~32 clock ticks of overhead on each call (you'll need to do testing for your own processor):

#include <cpuid.h>
#include <stdint.h>

/*** Low level interface ***/

/* there may be some unnecessary clobbering here*/
#define _setClockStart(HIs,LOs) {                                           \
asm volatile ("CPUID \n\t"                                                  \
              "RDTSC \n\t"                                                  \
              "mov %%edx, %0 \n\t"                                          \
              "mov %%eax, %1 \n\t":                                         \
              "=r" (HIs), "=r" (LOs)::                                      \
              "%rax", "%rbx", "%rcx", "%rdx");                              \
}

#define _setClockEnd(HIe,LOe) {                                             \
asm volatile ("RDTSCP \n\t"                                                 \
              "mov %%edx, %0 \n\t"                                          \
              "mov %%eax, %1 \n \t"                                         \
              "CPUID \n \t": "=r" (HIe), "=r" (LOe)::                       \
              "%rax", "%rbx", "%rcx", "%rdx");                              \
} 
#define _setClockBit(HIs,LOs,s,HIe,LOe,e) {                                 \
  s=LOs | ((uint64_t)HIs << 32);                                            \
  e=LOe | ((uint64_t)HIe << 32);                                            \
}


/*** High level interface ***/

typedef struct {
  volatile uint32_t hiStart;
  volatile uint32_t loStart;
  volatile uint32_t hiEnd;
  volatile uint32_t loEnd;
  volatile uint64_t tStart;
  volatile uint64_t tEnd;

  /*tend-tstart*/
  uint64_t tDur;
} timer_st;

#define startTimer(ts)                                                      \
{                                                                           \
  _setClockStart(ts.hiStart,ts.loStart);                                    \
} 


#define endTimer(ts)                                                        \
{                                                                           \
  _setClockEnd(ts.hiEnd,ts.loEnd);                                          \
  _setClockBit(ts.hiStart,ts.loStart,ts.tStart,                             \
      ts.hiEnd,ts.loEnd,ts.tEnd);                                           \
  ts.tDur=ts.tEnd-ts.tStart;                                                \
}                                                                             

#define lapTimer(ts)                                                        \
{                                                                           \
  ts.hiStart=ts.hiEnd;                                                      \
  ts.loStart=ts.loEnd;                                                      \
}


然后用类似这样的名称来调用


Then call it with something like this

#include <stdio.h>
#include <math.h>
#include "macros.h" /* Macros for calling rdtsc above */

#define SAMPLE_SIZE 100000

int main()
{
  timer_st ts; 
  register double mean=0;
  register double variance=0;
  int i;

  /* "Warmup" */
  for(i=1;i<SAMPLE_SIZE;i++)
  {
    startTimer(ts);
    endTimer(ts);
  }

  /* Data collection */
  for(i=1;i<SAMPLE_SIZE;i++)
  {
    startTimer(ts);
    endTimer(ts);
    mean+=ts.tDur;
  }

  mean/=SAMPLE_SIZE; 

  fprintf(stdout,"SampleSize: %d\nMeanOverhead: %f\n", SAMPLE_SIZE,mean);


  return 0;
}

在我的Broadwell芯片上,我得到了这个输出

On my Broadwell chip I got this output

SampleSize: 100000
MeanOverhead: 28.946490

29个时钟的时钟分辨率非常好.人们通常使用的任何库函数(例如 gettimeofday )都不会具有时钟级别的准确性,并且开销约为200-300.

A clock resolution of 29 clock tics is pretty good. Any library function that people typical use (like gettimeofday) will not have clock-level accuracy and an overhead ~200-300.

我不确定您所说的硬件开销"与软件开销"是什么意思,但是对于上述实现,没有函数调用来执行计时,也没有在 rdtsc 调用之间进行中间代码.因此,我认为软件开销为零.

I'm not sure what you mean by "hardware overhead" vs "software overhead" but for the implementation above there are no function calls to do the timing nor intermediate code between rdtsc calls. So I suppose the software overhead would be zero.

这篇关于如何在C编程中使用rdtsc估算for循环的开销的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆