测量OpenCL内核的执行时间 [英] Measuring execution time of OpenCL kernels
问题描述
我有以下循环可以测量内核时间:
I have the following loop that measures the time of my kernels:
double elapsed = 0;
cl_ulong time_start, time_end;
for (unsigned i = 0; i < NUMBER_OF_ITERATIONS; ++i)
{
err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &global, NULL, 0, NULL, &event); checkErr(err, "Kernel run");
err = clWaitForEvents(1, &event); checkErr(err, "Kernel run wait fro event");
err = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START, sizeof(time_start), &time_start, NULL); checkErr(err, "Kernel run get time start");
err = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(time_end), &time_end, NULL); checkErr(err, "Kernel run get time end");
elapsed += (time_end - time_start);
}
然后我将elapsed
除以NUMBER_OF_ITERATIONS
以获得最终估计值.但是,我担心单个内核的执行时间太短,因此可能会给我的测量带来不确定性.如何测量所有NUMBER_OF_ITERATIONS
内核组合所花费的时间?
Then I divide elapsed
by NUMBER_OF_ITERATIONS
to get the final estimate. However, I am afraid the execution time of individual kernels is too small and hence can introduce uncertainty into my measurement. How can I measure the time taken by all NUMBER_OF_ITERATIONS
kernels combined?
您能建议一个分析工具吗,因为我不需要以编程方式访问此数据,因此可以提供帮助.我使用NVIDIA的OpenCL.
Can you suggest a profiling tool, which could help with this, as I do not need to access this data programmatically. I use NVIDIA's OpenCL.
推荐答案
您需要按照以下步骤测量OpenCL内核执行时间的执行时间:
you need follow next steps to measure the execution time of OpenCL kernel execution time:
-
创建队列,创建队列时需要启用分析:
Create a queue, profiling need been enable when the queue is created:
cl_command_queue command_queue;
command_queue = clCreateCommandQueue(context, devices[deviceUsed], CL_QUEUE_PROFILING_ENABLE, &err);
在启动内核时链接事件
Link an event when launch a kernel
cl_event event;
err=clEnqueueNDRangeKernel(queue, kernel, woridim, NULL, workgroupsize, NULL, 0, NULL, &event);
等待内核完成
Wait for the kernel to finish
clWaitForEvents(1, &event);
等待所有排队的任务完成
Wait for all enqueued tasks to finish
clFinish(queue);
获取性能分析数据并计算内核执行时间(OpenCL API以纳秒为单位返回)
Get profiling data and calculate the kernel execution time (returned by the OpenCL API in nanoseconds)
cl_ulong time_start;
cl_ulong time_end;
clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START, sizeof(time_start), &time_start, NULL);
clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(time_end), &time_end, NULL);
double nanoSeconds = time_end-time_start;
printf("OpenCl Execution time is: %0.3f milliseconds \n",nanoSeconds / 1000000.0);
这篇关于测量OpenCL内核的执行时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!