测量OpenCL内核的执行时间 [英] Measuring execution time of OpenCL kernels

查看：111 发布时间：2020/5/20 18:50:44 profiling opencl

本文介绍了测量OpenCL内核的执行时间的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下循环可以测量内核时间:

I have the following loop that measures the time of my kernels:

double elapsed = 0;
cl_ulong time_start, time_end;
for (unsigned i = 0; i < NUMBER_OF_ITERATIONS; ++i)
{
    err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &global, NULL, 0, NULL, &event); checkErr(err, "Kernel run");
    err = clWaitForEvents(1, &event); checkErr(err, "Kernel run wait fro event");
    err = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START, sizeof(time_start), &time_start, NULL); checkErr(err, "Kernel run get time start");
    err = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(time_end), &time_end, NULL); checkErr(err, "Kernel run get time end");
    elapsed += (time_end - time_start);
}

然后我将elapsed除以NUMBER_OF_ITERATIONS以获得最终估计值.但是，我担心单个内核的执行时间太短，因此可能会给我的测量带来不确定性.如何测量所有NUMBER_OF_ITERATIONS内核组合所花费的时间?

Then I divide elapsed by NUMBER_OF_ITERATIONS to get the final estimate. However, I am afraid the execution time of individual kernels is too small and hence can introduce uncertainty into my measurement. How can I measure the time taken by all NUMBER_OF_ITERATIONS kernels combined?

您能建议一个分析工具吗，因为我不需要以编程方式访问此数据，因此可以提供帮助.我使用NVIDIA的OpenCL.

Can you suggest a profiling tool, which could help with this, as I do not need to access this data programmatically. I use NVIDIA's OpenCL.

推荐答案

您需要按照以下步骤测量OpenCL内核执行时间的执行时间:

you need follow next steps to measure the execution time of OpenCL kernel execution time:

创建队列，创建队列时需要启用分析:

Create a queue, profiling need been enable when the queue is created:

cl_command_queue command_queue;
command_queue = clCreateCommandQueue(context, devices[deviceUsed], CL_QUEUE_PROFILING_ENABLE, &err);

在启动内核时链接事件

Link an event when launch a kernel

cl_event event;
err=clEnqueueNDRangeKernel(queue, kernel, woridim, NULL, workgroupsize, NULL, 0, NULL, &event);

等待内核完成

Wait for the kernel to finish

clWaitForEvents(1, &event);

等待所有排队的任务完成

Wait for all enqueued tasks to finish

clFinish(queue);

获取性能分析数据并计算内核执行时间(OpenCL API以纳秒为单位返回)

Get profiling data and calculate the kernel execution time (returned by the OpenCL API in nanoseconds)

cl_ulong time_start;
cl_ulong time_end;

clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START, sizeof(time_start), &time_start, NULL);
clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(time_end), &time_end, NULL);

double nanoSeconds = time_end-time_start;
printf("OpenCl Execution time is: %0.3f milliseconds \n",nanoSeconds / 1000000.0);

这篇关于测量OpenCL内核的执行时间的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

测量OpenCL内核的执行时间 [英] Measuring execution time of OpenCL kernels

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

测量OpenCL内核的执行时间 [英] Measuring execution time of OpenCL kernels

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭