最好的方法来测试代码速度在C + +没有分析器,或者没有意义尝试? [英] Best way to test code speed in C++ without profiler, or does it not make sense to try?

查看:174
本文介绍了最好的方法来测试代码速度在C + +没有分析器,或者没有意义尝试?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在SO上,有很多关于性能分析的问题,但我似乎没有找到整个图片。有很多问题涉及和大多数问题& A一次忽略几个,或者不提出他们的建议。

On SO, there are quite a few questions about performance profiling, but I don't seem to find the whole picture. There are quite a few issues involved and most Q & A ignore all but a few at a time, or don't justify their proposals.

我想知道什么。如果我有两个功能做同样的事情,而且我很好奇速度的差异,没有外部工具,计时器,或者这个编译测试会影响结果很多有意义吗?

What Im wondering about. If I have two functions that do the same thing, and Im curious about the difference in speed, does it make sense to test this without external tools, with timers, or will this compiled in testing affect the results to much?

我问这是因为如果它是明智的,作为一个C ++程序员,我想知道如何最好地做,因为它们比使用外部工具简单得多。如果有意义,请继续处理所有可能的错误:

I ask this because if it is sensible, as a C++ programmer, I want to know how it should best be done, as they are much simpler than using external tools. If it makes sense, lets proceed with all the possible pitfalls:

考虑这个例子。下面的代码显示了两种做同样事情的方法:

Consider this example. The following code shows 2 ways of doing the same thing:

#include <algorithm>
#include <ctime>
#include <iostream>

typedef unsigned char byte;

inline
void
swapBytes( void* in, size_t n )
{
   for( size_t lo=0, hi=n-1; hi>lo; ++lo, --hi )

      in[lo] ^= in[hi]
   ,  in[hi] ^= in[lo]
   ,  in[lo] ^= in[hi] ;
}

int
main()
{
         byte    arr[9]     = { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h' };
   const int     iterations = 100000000;
         clock_t begin      = clock();

   for( int i=iterations; i!=0; --i ) 

      swapBytes( arr, 8 );

   clock_t middle = clock();

   for( int i=iterations; i!=0; --i ) 

      std::reverse( arr, arr+8 );

   clock_t end = clock();

   double secSwap = (double) ( middle-begin ) / CLOCKS_PER_SEC;
   double secReve = (double) ( end-middle   ) / CLOCKS_PER_SEC;


   std::cout << "swapBytes,    for:    "   << iterations << " times takes: " << middle-begin
             << " clock ticks, which is: " << secSwap    << "sec."           << std::endl;

   std::cout << "std::reverse, for:    "   << iterations << " times takes: " << end-middle
             << " clock ticks, which is: " << secReve    << "sec."           << std::endl;

   std::cin.get();
   return 0;
}

// Output:

// Release:
//  swapBytes,    for: 100000000 times takes: 3000 clock ticks, which is: 3sec.
//  std::reverse, for: 100000000 times takes: 1437 clock ticks, which is: 1.437sec.

// Debug:
//  swapBytes,    for: 10000000 times takes: 1781  clock ticks, which is: 1.781sec.
//  std::reverse, for: 10000000 times takes: 12781 clock ticks, which is: 12.781sec.

问题:


  1. 使用哪个定时器以及如何获取有问题的代码实际消耗的cpu时间?

  2. 编译器优化有什么影响(因为这些函数只是来回交换字节,最有效的事情显然是什么都不做)?

  3. 考虑到这里提供的结果,你认为他们是准确的(我可以向你保证,多次运行给出非常相似的结果) ?如果是的话,你能解释std :: reverse如何快,考虑到自定义函数的简单性。我没有来自我用于此测试的vc ++版本的源代码,但这里是来自GNU的实现。它归结为函数 iter_swap ,这是完全不可理解的我。如果是,为什么?

  1. Which timers to use and how get the cpu time actually consumed by the code under question?
  2. What are the effects of compiler optimization (since these functions just swap bytes back and forth, the most efficient thing is obviously to do nothing at all)?
  3. Considering the results presented here, do you think they are accurate (I can assure you that multiple runs give very similar results)? If yes, can you explain how std::reverse gets to be so fast, considering the simplicity of the custom function. I don't have the source code from the vc++ version that I used for this test, but here is the implementation from GNU. It boils down to the function iter_swap, which is completely incomprehensible for me. Would this also be expected to run twice as fast as that custom function, and if so, why?

沉思:


  1. 似乎提出了两个高精度计时器: clock() QueryPerformanceCounter (在Windows上)。显然,我们想测量我们的代码的cpu时间,而不是实时,但据我所知,这些功能不提供这个功能,所以系统上的其他进程会干扰测量。 gnu c库中的此页似乎矛盾,但是当我把一个断点在vc ++,调试过程获得了很多时钟滴答,即使它被暂停(我没有在gnu下测试)。我缺少替代计数器,或者我们至少需要特殊的库或类吗?如果没有,在这个例子中时钟是否足够好,还是有理由使用QueryPerformanceCounter?

  1. It seems two high precision timers are being proposed: clock() and QueryPerformanceCounter (on windows). Obviously we would like to measure the cpu time of our code and not the real time, but as far as I understand, these functions don't give that functionality, so other processes on the system would interfere with measurements. This page on the gnu c library seems to contradict that, but when I put a breakpoint in vc++, the debugged process gets a lot of clock ticks even though it was suspended (I have not tested under gnu). Am I missing alternative counters for this, or do we need at least special libraries or classes for this? If not, is clock good enough in this example or would there be a reason to use the QueryPerformanceCounter?

分析工具?有什么事情发生吗?函数调用是否内联?当检查调试器,字节确实被交换,但我宁愿从理论知道为什么,而不是从测试。

What can we know for certain without debugging, dissassembling and profiling tools? Is anything actually happening? Is the function call being inlined or not? When checking in the debugger, the bytes do actually get swapped, but I'd rather know from theory why, than from testing.

感谢您的任何指示。

更新

感谢提示中的无速度 - 无轮廓或无法 - 无法 - 377026 / tojas> tojas swapBytes函数现在运行的速度与std :: reverse一样快。我没有意识到,在一个字节的情况下的临时副本必须只是一个寄存器,因此是非常快。优雅可能使你失明。

Thanks to a hint from tojas the swapBytes function now runs as fast as the std::reverse. I had failed to realize that the temporary copy in case of a byte must be only a register, and thus is very fast. Elegance can blind you.

inline
void
swapBytes( byte* in, size_t n )
{
   byte t;

   for( int i=0; i<7-i; ++i )
    {
        t       = in[i];
        in[i]   = in[7-i];
        in[7-i] = t;
    }
}

由于提示 ChrisW 我发现在Windows上,你可以得到实际cpu消耗的时间(读取:你的)过程通过 Windows管理规范。这显然比高精度计数器更有趣。

Thanks to a tip from ChrisW I have found that on windows you can get the actual cpu time consumed by a (read:your) process trough Windows Management Instrumentation. This definitely looks more interesting than the high precision counter.

推荐答案


显然我们想测量我们的代码的cpu时间而不是实时,但据我所知,这些功能不提供那个功能,所以系统上的其他进程会干扰测量。

Obviously we would like to measure the cpu time of our code and not the real time, but as far as I understand, these functions don't give that functionality, so other processes on the system would interfere with measurements.

我做两件事,以确保挂钟时间和CPU时间大致相同:

I do two things, to ensure that wall-clock time and CPU time are approximately the same thing:


  • 测试相当长的时间,即几秒钟(例如通过测试一个循环,但是有数千次迭代)

  • Test for a significant length of time, i.e. several seconds (e.g. by testing a loop of however many thousands of iterations)

测试

或者,如果您想测量只有/更确切地说每个线程的CPU时间,这可以作为性能计数器(参见例如 perfmon.exe )。

Alternatively if you want to measure only/more exactly the CPU time per thread, that's available as a performance counter (see e.g. perfmon.exe).

无需调试,反汇编和性能分析工具,我们可以知道什么?

What can we know for certain without debugging, dissassembling and profiling tools?

I / O往往相对较慢)。

Nearly nothing (except that I/O tends to be relatively slow).

这篇关于最好的方法来测试代码速度在C + +没有分析器,或者没有意义尝试?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆