local_work_size对性能的影响及其原因 [英] Affect of local_work_size on performance and why it is

查看:465
本文介绍了local_work_size对性能的影响及其原因的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好......
我是opencl的新手,正在尝试探索更多@@

Hello Everyone....
i am new to opencl and trying to explore more @ it.

openCL程序中local_work_size的工作是什么,以及它对性能的影响.

What is the work of local_work_size in openCL program and how it matters in performance.

我正在研究一些图像处理算法,并且为我提供的openCL内核

I am working on some image processing algo and for my openCL kernel i gave as

size_t local_item_size = 1; 
size_t global_item_size = (int) (ceil((float)(D_can_width*D_can_height)/local_item_size))*local_item_size; // Process the entire lists
ret = clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL,&global_item_size, &local_item_size, 0, NULL, NULL);

并且对于我更改后的相同内核

and for same kernel when i changed

 size_t local_item_size = 16;

一切都保持不变.

我的性能提高了约4-5倍.

i got around 4-5 times faster performance.

推荐答案

local-work-size (又称 work-group-size )是<每个工作组中的strong>工作项.

The local-work-size, aka work-group-size, is the number of work-items in each work-group.

每个工作组都是在一个计算单元上执行的,该单元可以处理很多工作项,而不仅仅是一个.

Each work-group is executed on a compute-unit which is able to handle a bunch of work-items, not only one.

因此,如果使用的组太小,则会浪费一些计算能力,并且只能在计算单元级别获得粗略的并行化.

So when you are using too small groups you waste some computing power, and only got a coarse parallelization at the compute-unit level.

但是,如果组中的工作项太多,那么由于可能不使用某些计算单元,而另一些计算单元将被过度使用,您也可能会失去并行化的机会.

But if you have too many work-items in a group you can also lose some opportunnity for parallelization as some compute-units may not be used, whereas other would be overused.

因此,您可以通过传递 NULL 作为本地工作大小,用很多值进行测试以找到最佳值,或者只是让OpenCL为您选择好的值.

So you could test with many values to find the best one or just let OpenCL pick a good one for you by passing NULL as the local-work-size.

PS:与您以前的值相比,我想了解OpenCL选择的性能,因此,请您进行测试并发布结果. 谢谢:)

PS : I'll be interested in knowing the peformance with OpenCL choice compared to your previous values, so could you please make a test and post the results. Thanks :)

这篇关于local_work_size对性能的影响及其原因的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆