标杆code - 我在做对吗? [英] Benchmarking code - am I doing it right?

查看:124
本文介绍了标杆code - 我在做对吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要一个基准C / C ++ code。我想衡量CPU时间,墙的时间和周期/字节。我写了一些测量功能,但有次/字节的一个问题。

I want to benchmark a C/C++ code. I want to measure cpu time, wall time and cycles/byte. I wrote some mesurement functions but have a problem with cycles/byte.

要获得CPU时间我写了一个函数的getrusage() RUSAGE_SELF ,墙体时间我使用 clock_gettime 单调,获得次/字节我用 RDTSC

To get a cpu time I wrote a function getrusage() with RUSAGE_SELF, for wall time i use clock_gettime with MONOTONIC, to get cycles/byte I use rdtsc.

我处理大小的输入缓冲区,例如,1024:字符缓冲区[1024] 。我怎么基准:

I process an input buffer of size, for example, 1024: char buffer[1024]. How do I benchmark:


  1. 请热身阶段,只需拨打 fun2measure(参数) 1000次:

  1. Do a warm-up phase, simply call fun2measure(args) 1000 times:

的for(int i = 0; I< 1000;我++)
    fun2measure(参数);


  1. 然后,做一个真正的定时基准,墙时间:

  1. Then, do a real-timing benchmark, for wall time:

`unsigned long类型I;
双timeTaken;
双timeTotal = 3.0; //过程3秒。

`unsigned long i; double timeTaken; double timeTotal = 3.0; // process 3 seconds

有关(timeTaken =(双)0,I = 0; timeTaken< = timeTotal; timeTaken = walltime(1),我++)
    fun2measure(参数); `

for (timeTaken=(double)0, i=0; timeTaken <= timeTotal; timeTaken = walltime(1), i++) fun2measure(args); `

和CPU时间(几乎相同):

And for cpu time (almost the same):

为(timeTaken =(双)0,I = 0; timeTaken&LT; = timeTotal; timeTaken = walltime(1),我++)
         fun2measure(参数);

但是,当我想要得到功能的CPU周期数,我用这块code的:

But when I want to get a cpu cycle count for function, I use this piece of code:

`unsigned long s = cyclecount();
    for (timeTaken=(double)0, i=0; timeTaken <= timeTotal; timeTaken = walltime(1), i++)
    {
        fun2measure(args);
    }
    unsigned long e = cyclecount();

unsigned long s = cyclecount();
    for (timeTaken=(double)0, i=0; timeTaken <= timeTotal; timeTaken = cputime(1), i++)
    {
        fun2measure(args);
    }
    unsigned long e = cyclecount();`

然后,计数周期/字节:((E - S)/(I * inputsSize); 在这里 inputsSize 是1024,因为缓冲区的长度。但是,当我升到 TOTALTIME 来10S我GE奇怪的结果

and then, count cycles/byte: ((e - s) / (i * inputsSize);. Here inputsSize is 1024 because its the length of the buffer. But when I rise totalTime to 10s I ge strange results:

10秒:

Did fun2measure 1148531 times in 10.00 seconds for 1024 bytes, 0 cycles/byte [CPU]
Did fun2measure 1000221 times in 10.00 seconds for 1024 bytes, 3.000000 cycles/byte [WALL]

5秒:

Did fun2measure 578476 times in 5.00 seconds for 1024 bytes, 0 cycles/byte [CPU]
Did fun2measure 499542 times in 5.00 seconds for 1024 bytes, 7.000000 cycles/byte [WALL]

4秒:

Did fun2measure 456828 times in 4.00 seconds for 1024 bytes, 4 cycles/byte [CPU]
Did fun2measure 396612 times in 4.00 seconds for 1024 bytes, 3.000000 cycles/byte [WALL]

我的问题:


  1. 是那些结果好吗?

  2. 为什么当我增加时间我总是在CPU获得0次/字节?

  3. 如何测量平均时间,意思是说,这样的基准标准偏差等统计?

  4. 是我的基准方法,100%的好不好?

干杯!

1日编辑:

改变后 I 双击

Did fun2measure 1138164.00 times in 10.00 seconds for 1024 bytes, 0.410739 cycles/byte [CPU]
Did fun2measure 999849.00 times in 10.00 seconds for 1024 bytes, 3.382036 cycles/byte [WALL]

我的结果似乎是ok了。所以,问题#2心不是一个问题了:)

my results seem to be ok. So question #2 isnt a question anymore:)

推荐答案

您基准CycleCount的是有缺陷的,因为它包括了walltime / CPUTIME函数调用的成本。总的来说,虽然,我强烈建议你使用,而不是试图另起炉灶正确的探查。尤其是性能计数器会给你,你可以依靠数字。还要注意的是周期是非常不可靠的CPU通常不是在固定频率运行,或者内核可能做一个任务切换和停止你的应用程序有一段时间了。

Your cyclecount benchmark is flawed as it includes the cost for walltime/cputime function calls. In general though, I strongly urge you to use a proper profiler instead of trying to reinvent the wheel. Especially performance counters will give you numbers that you can rely on. Also note that cycles are very unreliable as the CPU is usually not running at a fixed frequency or the kernel may do a task switch and halt your app for some time.

我亲自执笔的基准,使它们运行给定函数n次,N是足够大的,这样你得到足够的样本。外部然后我申请一个分析器,如Linux PERF给我弄一些硬的数字来思考。重复基准给定的时间,那么你可以计算STDDEV /平均数值,您可以在运行基准几次,评估事件探查器的输出端的脚本做的。

I personally write benchmarks such that they run a given function N times, for N being large enough such that you get enough samples. Externally then I apply a profiler such as linux perf to get me some hard numbers to reason about. Repeating the benchmark a given time you can then calculate stddev/avg values, which you can do in a script that runs the benchmark a few times and evaluates the output of the profiler.

这篇关于标杆code - 我在做对吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆