如何测量cpu时间和挂钟时间? [英] How to measure cpu time and wall clock time?

查看:260
本文介绍了如何测量cpu时间和挂钟时间?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

即使在stackoverflow上,我也看到了许多与此相关的主题,例如:

I saw many topics about this, even on stackoverflow, for example:

我想同时测量cpu和墙时间。尽管在我发布的主题中回答问题的人建议使用 gettimeofday 来衡量墙时,但我读到最好使用 clock_gettime 。因此,我在下面编写了代码(可以,它真的可以测量墙壁时间,而不是cpu时间吗?我问,因为我找到了一个网页: http://nadeausoftware.com/articles/2012/03/c_c_tip_how_measure_cpu_time_benchmarking#clockgettme 其中表示 clock_get的地方c>测量一个cpu时间...)真相是什么,我应该使用哪个来测量墙壁时间?

I want to measure both cpu and wall time. Although person who answered a question in topic I posted recommend using gettimeofday to measure a wall time, I read that its better to use instead clock_gettime. So, I wrote the code below (is it ok, is it really measure a wall time, not cpu time? Im asking, cause I found a webpage: http://nadeausoftware.com/articles/2012/03/c_c_tip_how_measure_cpu_time_benchmarking#clockgettme where it says that clock_gettime measures a cpu time...) Whats the truth and which one should I use to measure a wall time?

另一个问题是关于cpu时间。我发现答案是 clock 非常有用,因此我也为此编写了示例代码。但这并不是我真正想要的,对于我的代码,它向我显示了0秒的CPU时间。是否可以更精确地测量CPU时间(以秒为单位)?感谢您的任何帮助(到目前为止,我仅对Linux解决方案感兴趣)。

Another question is about cpu time. I found the answer that clock is great about it, so I wrote a sample code for it too. But its not what I really want, for my code it shows me a 0 secods of cpu time. Is it possible to measure cpu time more precisely (in seconds)? Thanks for any help (for now on, Im interested only in Linux solutions).

这里是我的代码:

#include <time.h>
#include <stdio.h>      /* printf */
#include <math.h>       /* sqrt */
#include <stdlib.h>

int main()
{
    int i;
    double sum;

    // measure elapsed wall time
    struct timespec now, tmstart;
    clock_gettime(CLOCK_REALTIME, &tmstart);
    for(i=0; i<1024; i++){
        sum += log((double)i);
    }
    clock_gettime(CLOCK_REALTIME, &now);
    double seconds = (double)((now.tv_sec+now.tv_nsec*1e-9) - (double)(tmstart.tv_sec+tmstart.tv_nsec*1e-9));
    printf("wall time %fs\n", seconds);

    // measure cpu time
    double start = (double)clock() /(double) CLOCKS_PER_SEC;
    for(i=0; i<1024; i++){
        sum += log((double)i);
    }
    double end = (double)clock() / (double) CLOCKS_PER_SEC;
    printf("cpu time %fs\n", end - start);

    return 0;
}

像这样编译:


gcc test.c -o test -lrt -lm

gcc test.c -o test -lrt -lm

它显示了我:

wall time 0.000424s
cpu time 0.000000s

我知道我可以进行更多次迭代,但这不是重点;)

I know I can make more iterations but thats not the point here ;)

重要提示:

printf("CLOCKS_PER_SEC is %ld\n", CLOCKS_PER_SEC);

显示

CLOCKS_PER_SEC is 1000000


推荐答案

根据我的手册 clock 上的页面


POSIX要求CLOCKS_PER_SEC等于1000000,与实际值无关

POSIX requires that CLOCKS_PER_SEC equals 1000000 independent of the actual resolution.

增加计算机上的迭代次数时,测得的cpu时间开始显示100000次迭代。从返回的数字看来,分辨率实际上是10毫秒。

When increasing the number iterations on my computer the measured cpu-time starts showing on 100000 iterations. From the returned figures it seems the resolution is actually 10 millisecond.

请注意,当您优化代码时,整个循环可能会消失,因为 sum 是无效值。也没有什么可以阻止编译器在整个循环中移动 clock 语句,因为它们之间的代码之间没有真正的依存关系。

Beware that when you optimize your code, the whole loop may disappear because sum is a dead value. There is also nothing to stop the compiler from moving the clock statements across the loop as there are no real dependences with the code in between.

让我详细介绍一下代码性能的微观度量。评估性能的幼稚而诱人的方法的确是通过添加 clock 语句来完成的。但是,由于时间不是C语言中的概念或副作用,因此编译器通常可以随意移动这些 clock 调用。为了解决这个问题,很容易使这样的 clock 调用产生副作用,例如使它访问 volatile 变量。但是,这仍然不能阻止编译器在调用中移动高度副作用的免费代码。以访问常规局部变量为例。但是更糟糕的是,通过使 clock 调用对编译器来说非常可怕,您实际上会对所有优化产生负面影响。结果,仅仅衡量性能会对性能产生负面和不良影响。

Let me elaborate a bit more on micro measurements of performance of code. The naive and tempting way to measure performance is indeed by adding clock statements as you have done. However since time is not a concept or side effect in C, compilers can often move these clock calls at will. To remedy this it is tempting to make such clock calls have side effects by for example having it access volatile variables. However this still doesn't prohibit the compiler from moving highly side-effect free code over the calls. Think for example of accessing regular local variables. But worse, by making the clock calls look very scary to the compiler, you will actually negatively impact any optimizations. As a result, mere measuring of the performance impacts that performance in a negative and undesirable way.

如果您使用概要分析(正如某人已经提到的那样),您可以得到一个漂亮的即使增加了总的时间,也可以很好地评估甚至优化的代码的性能。

If you use profiling, as already mentioned by someone, you can get a pretty good assessment of the performance of even optimized code, although the overall time of course is increased.

另一种衡量性能的好方法是要求编译器报告一些代码需要的周期。对于许多体系结构,编译器对此都有非常准确的估计。但是,对于Pentium体系结构而言,最值得注意的并不是因为硬件执行了大量难以预测的调度。

Another good way to measure performance is just asking the compiler to report the number of cycles some code will take. For a lot of architectures the compiler has a very accurate estimate of this. However most notably for a Pentium architecture it doesn't because the hardware does a lot of scheduling that is hard to predict.

尽管这不是常规做法,但我认为编译器应该支持 pragma 标记要测量的功能。然后,编译器可以在函数的序言和结尾中包括高精度的非介入式测量点,并禁止对函数进行任何内联。根据架构的不同,它可以选择高精度时钟来测量时间,最好在操作系统的支持下仅测量当前进程的时间。

Although it is not standing practice I think compilers should support a pragma that marks a function to be measured. The compiler then can include high precision non-intrusive measuring points in the prologue and epilogue of a function and prohibit any inlining of the function. Depending on the architecture it can choose a high precision clock to measure time, preferably with support from the OS to only measure time of the current process.

这篇关于如何测量cpu时间和挂钟时间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆