如何在C编程中使用rdtsc估算for循环的开销 [英] how to estimate the overhead of for loop using rdtsc in c programming
问题描述
我想通过在0到7的范围内增加参数来计算功能参数的开销.如何估算硬件开销和软件开销.
I want to calculate the overhead of the parameters of a fucntion with increasing the parameters over a range of 0 to 7 . How to estimate the hardware overhead and software overhead .
推荐答案
您的问题提出的并不十分正确.但是,执行 rdtsc
指令的最可靠方法是仅通过内联汇编调用它,所有C编译器都完全支持内联汇编.C标准规定的任何计时功能都会因实现方式而有所不同.英特尔在实现 rdtsc
东西
Your question isn't really well posed. However, the most reliable way to execute the rdtsc
instruction is to just call it with inline assembly, which is fully supported by all C compilers. Any timing function prescribed by a C standard will vary by implementation. Intel has a really good white paper on the best way to implement rdtsc
stuff here. The major concern is out-of-order execution, which may be out of the scope of your question.
我找到的最佳实现是在此存储库中,我已对其进行了调整自己使用.假设您有兼容的处理器,这组基本宏将为您带来每次调用约32个时钟滴答的开销(您需要对自己的处理器进行测试):
The best implementation I've found is in this repo, which I've adapted for my own use. This basic set of macros, assuming you have a compatible processor, will give you ~32 clock ticks of overhead on each call (you'll need to do testing for your own processor):
#include <cpuid.h>
#include <stdint.h>
/*** Low level interface ***/
/* there may be some unnecessary clobbering here*/
#define _setClockStart(HIs,LOs) { \
asm volatile ("CPUID \n\t" \
"RDTSC \n\t" \
"mov %%edx, %0 \n\t" \
"mov %%eax, %1 \n\t": \
"=r" (HIs), "=r" (LOs):: \
"%rax", "%rbx", "%rcx", "%rdx"); \
}
#define _setClockEnd(HIe,LOe) { \
asm volatile ("RDTSCP \n\t" \
"mov %%edx, %0 \n\t" \
"mov %%eax, %1 \n \t" \
"CPUID \n \t": "=r" (HIe), "=r" (LOe):: \
"%rax", "%rbx", "%rcx", "%rdx"); \
}
#define _setClockBit(HIs,LOs,s,HIe,LOe,e) { \
s=LOs | ((uint64_t)HIs << 32); \
e=LOe | ((uint64_t)HIe << 32); \
}
/*** High level interface ***/
typedef struct {
volatile uint32_t hiStart;
volatile uint32_t loStart;
volatile uint32_t hiEnd;
volatile uint32_t loEnd;
volatile uint64_t tStart;
volatile uint64_t tEnd;
/*tend-tstart*/
uint64_t tDur;
} timer_st;
#define startTimer(ts) \
{ \
_setClockStart(ts.hiStart,ts.loStart); \
}
#define endTimer(ts) \
{ \
_setClockEnd(ts.hiEnd,ts.loEnd); \
_setClockBit(ts.hiStart,ts.loStart,ts.tStart, \
ts.hiEnd,ts.loEnd,ts.tEnd); \
ts.tDur=ts.tEnd-ts.tStart; \
}
#define lapTimer(ts) \
{ \
ts.hiStart=ts.hiEnd; \
ts.loStart=ts.loEnd; \
}
然后用类似这样的名称来调用
Then call it with something like this
#include <stdio.h>
#include <math.h>
#include "macros.h" /* Macros for calling rdtsc above */
#define SAMPLE_SIZE 100000
int main()
{
timer_st ts;
register double mean=0;
register double variance=0;
int i;
/* "Warmup" */
for(i=1;i<SAMPLE_SIZE;i++)
{
startTimer(ts);
endTimer(ts);
}
/* Data collection */
for(i=1;i<SAMPLE_SIZE;i++)
{
startTimer(ts);
endTimer(ts);
mean+=ts.tDur;
}
mean/=SAMPLE_SIZE;
fprintf(stdout,"SampleSize: %d\nMeanOverhead: %f\n", SAMPLE_SIZE,mean);
return 0;
}
在我的Broadwell芯片上,我得到了这个输出
On my Broadwell chip I got this output
SampleSize: 100000
MeanOverhead: 28.946490
29个时钟的时钟分辨率非常好.人们通常使用的任何库函数(例如 gettimeofday
)都不会具有时钟级别的准确性,并且开销约为200-300.
A clock resolution of 29 clock tics is pretty good. Any library function that people typical use (like gettimeofday
) will not have clock-level accuracy and an overhead ~200-300.
我不确定您所说的硬件开销"与软件开销"是什么意思,但是对于上述实现,没有函数调用来执行计时,也没有在 rdtsc
调用之间进行中间代码.因此,我认为软件开销为零.
I'm not sure what you mean by "hardware overhead" vs "software overhead" but for the implementation above there are no function calls to do the timing nor intermediate code between rdtsc
calls. So I suppose the software overhead would be zero.
这篇关于如何在C编程中使用rdtsc估算for循环的开销的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!