使用事件定时CUDA应用程序 [英] Timing a CUDA application using events

查看:214
本文介绍了使用事件定时CUDA应用程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用以下两个函数来计算我的代码的不同部分(cudaMemcpyHtoD,内核执行,cudaMemcpyDtoH)(包括多gpus,同一GPU上的并发内核,内核的顺序执行等等)。根据我的理解,这些函数会记录事件之间的时间间隔,但我想在代码的生命周期中插入事件可能会导致开销和不准确。

I am using the following two functions to time different parts (cudaMemcpyHtoD, kernel execution, cudaMemcpyDtoH) of my code (which includes multi-gpus, concurrent kernels on same GPU, sequential execution of kernels, et al). As I understand, these functions would record the time elapsed between the events, but I guess inserting events along the lifetime of the code may result in overheads and inaccuracies. I would like to hear criticisms, general advice to improve these functions and caveat emptors regarding them.

//Create event and start recording
cudaEvent_t *start_event(int device, cudaEvent_t *events, cudaStream_t streamid=0)
{
        cutilSafeCall( cudaSetDevice(device) );
        cutilSafeCall( cudaEventCreate(&events[0]) );
        cutilSafeCall( cudaEventCreate(&events[1]) );
        cudaEventRecord(events[0], streamid);

    return events;
 }

 //Return elapsed time and destroy events
 float end_event(int device, cudaEvent_t *events, cudaStream_t streamid=0)
 {

        float elapsed = 0.0;
        cutilSafeCall( cudaSetDevice(device) );
        cutilSafeCall( cudaEventRecord(events[1], streamid) );
        cutilSafeCall( cudaEventSynchronize(events[1]) );
        cutilSafeCall( cudaEventElapsedTime(&elapsed, events[0], events[1]) );

        cutilSafeCall( cudaEventDestroy( events[0] ) );
        cutilSafeCall( cudaEventDestroy( events[1] ) );

        return elapsed;
 }

用法:

cudaEvent_t *events;
cudaEvent_t event[2]; //0 for start and 1 for end
...
events = start_event( cuda_device, event, 0 );
<Code to time>
printf("Time taken for the above code... - %f secs\n\n", (end_event(cuda_device, events, 0) / 1000) );


推荐答案

首先,如果这是生产代码,想要能够在第二个cudaEventRecord和cudaEventSynchronize()之间做一些事情。否则,这可能会降低您的应用程序重叠GPU和CPU工作的能力。

First, if this is for production code, you may want to be able to do something between the second cudaEventRecord and cudaEventSynchronize(). Otherwise, this could reduce the ability of your app to overlap GPU and CPU work.

接下来,我将事件创建和销毁与事件记录分离。我不知道成本,但一般来说你可能不想经常调用cudaEventCreate和cudaEventDestroy。

Next, I would separate event creation and destruction from event recording. I'm not sure of the cost, but in general you might not want to call cudaEventCreate and cudaEventDestroy often.

我会做的是创建一个类这样



What I would do is create a class like this

class EventTimer {
public:
  EventTimer() : mStarted(false), mStopped(false) {
    cudaEventCreate(&mStart);
    cudaEventCreate(&mStop);
  }
  ~EventTimer() {
    cudaEventDestroy(mStart);
    cudaEventDestroy(mStop);
  }
  void start(cudaStream_t s = 0) { cudaEventRecord(mStart, s); 
                                   mStarted = true; mStopped = false; }
  void stop(cudaStream_t s = 0)  { assert(mStarted);
                                   cudaEventRecord(mStop, s); 
                                   mStarted = false; mStopped = true; }
  float elapsed() {
    assert(mStopped);
    if (!mStopped) return 0; 
    cudaEventSynchronize(mStop);
    float elapsed = 0;
    cudaEventElapsedTime(&elapsed, mStart, mStop);
    return elapsed;
  }

private:
  bool mStarted, mStopped;
  cudaEvent_t mStart, mStop;
};

注意我没有包括cudaSetDevice() - 似乎应该留给代码使用这个类,使它更灵活。用户必须确保在调用启动和停止时同一设备处于活动状态。

Note I didn't include cudaSetDevice() -- seems to me that should be left to the code that uses this class, to make it more flexible. The user would have to ensure the same device is active when start and stop are called.

PS:NVIDIA不打算依靠CUTIL来生产代码 - - 它只是为了方便在我们的示例中使用,并没有像CUDA库和编译器本身一样严格测试或优化。我建议你提取像cutilSafeCall()到你自己的库和头文件。

PS: It is not NVIDIA's intent for CUTIL to be relied upon for production code -- it is used simply for convenience in our examples and is not as rigorously tested or optimized as the CUDA libraries and compilers themselves. I recommend you extract things like cutilSafeCall() into your own libraries and headers.

这篇关于使用事件定时CUDA应用程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆