测量OpenCL内核的执行时间 [英] Measuring execution time of OpenCL kernels

查看:111
本文介绍了测量OpenCL内核的执行时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下循环可以测量内核时间:

I have the following loop that measures the time of my kernels:

double elapsed = 0;
cl_ulong time_start, time_end;
for (unsigned i = 0; i < NUMBER_OF_ITERATIONS; ++i)
{
    err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &global, NULL, 0, NULL, &event); checkErr(err, "Kernel run");
    err = clWaitForEvents(1, &event); checkErr(err, "Kernel run wait fro event");
    err = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START, sizeof(time_start), &time_start, NULL); checkErr(err, "Kernel run get time start");
    err = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(time_end), &time_end, NULL); checkErr(err, "Kernel run get time end");
    elapsed += (time_end - time_start);
}

然后我将elapsed除以NUMBER_OF_ITERATIONS以获得最终估计值.但是,我担心单个内核的执行时间太短,因此可能会给我的测量带来不确定性.如何测量所有NUMBER_OF_ITERATIONS内核组合所花费的时间?

Then I divide elapsed by NUMBER_OF_ITERATIONS to get the final estimate. However, I am afraid the execution time of individual kernels is too small and hence can introduce uncertainty into my measurement. How can I measure the time taken by all NUMBER_OF_ITERATIONS kernels combined?

您能建议一个分析工具吗,因为我不需要以编程方式访问此数据,因此可以提供帮助.我使用NVIDIA的OpenCL.

Can you suggest a profiling tool, which could help with this, as I do not need to access this data programmatically. I use NVIDIA's OpenCL.

推荐答案

您需要按照以下步骤测量OpenCL内核执行时间的执行时间:

you need follow next steps to measure the execution time of OpenCL kernel execution time:

  1. 创建队列,创建队列时需要启用分析:

  1. Create a queue, profiling need been enable when the queue is created:

cl_command_queue command_queue;
command_queue = clCreateCommandQueue(context, devices[deviceUsed], CL_QUEUE_PROFILING_ENABLE, &err);

  • 在启动内核时链接事件

  • Link an event when launch a kernel

    cl_event event;
    err=clEnqueueNDRangeKernel(queue, kernel, woridim, NULL, workgroupsize, NULL, 0, NULL, &event);
    

  • 等待内核完成

  • Wait for the kernel to finish

    clWaitForEvents(1, &event);
    

  • 等待所有排队的任务完成

  • Wait for all enqueued tasks to finish

    clFinish(queue);
    

  • 获取性能分析数据并计算内核执行时间(OpenCL API以纳秒为单位返回)

  • Get profiling data and calculate the kernel execution time (returned by the OpenCL API in nanoseconds)

    cl_ulong time_start;
    cl_ulong time_end;
    
    clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START, sizeof(time_start), &time_start, NULL);
    clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(time_end), &time_end, NULL);
    
    double nanoSeconds = time_end-time_start;
    printf("OpenCl Execution time is: %0.3f milliseconds \n",nanoSeconds / 1000000.0);
    

  • 这篇关于测量OpenCL内核的执行时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆