CL_INVALID_WORK_GROUP_SIZE错误 [英] CL_INVALID_WORK_GROUP_SIZE error

查看:127
本文介绍了CL_INVALID_WORK_GROUP_SIZE错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有此代码,我已经在一段时间前为其发布了一些代码.

I have this code, for which I already posted something some time ago.

今天,我在一个小的测试程序中使用typedef结构运行我的内核,但是clEnqueueNDRangeKernel给出了无效的工作组大小错误.根据khronos网站,这可能有3个原因.

Today I got my kernel running with a typedef struct in a little test program, but clEnqueueNDRangeKernel gives an invalid work group size error. This can have 3 causes, according to the khronos webiste.

  1. 全局工作量不能除以本地工作量.在我的代码中,这是可分割的.
  2. 本地工作量大于GPU可以处理的大小.我的本地工作大小为128,远远低于报告的最大值1024.
  3. 与本地工作量NULL有关.我本地的工作尺寸不是NULL,而是128.
  1. Global work size is not divisable by the local work size. In my code, it is divisable.
  2. Local work size is bigger than the GPU can handle. My local worksize is 128, way under the reported maximum of 1024.
  3. Something to do with local work size that is NULL. My local work size isn't NULL, it's 128.

我已经在互联网上搜索了好几个小时,发现的大多数解决方案都涉及到查询clGetKernelWorkGroupInfo以获取最大本地工作量.当我这样做时,它还会报告1024.我现在真的没办法了,有人可以帮忙吗? :)

I've searched the internet for quite some hours, and most solutions I found involves to query clGetKernelWorkGroupInfo for the maximum local work size. When I do that, it also reports 1024. I'm really out of options now, can somebody help? :)

主要: http://pastebin.com/S6R6t3iF 内核:推荐答案

从您的pastebin链接中,我看到:

From your pastebin link, I see:

#define MAX_OP_X 4
#define MAX_OP_Y 4
#define MAX_OP MAX_OP_X * MAX_OP_Y      //aantal observer points
#define MAX_SEGMENTEN 128 //aantal segmenten
...
size_t globalSize = MAX_OP;
size_t localSize = MAX_SEGMENTEN;
...
errMsg = clEnqueueNDRangeKernel (commandQueue, kernel, 1, NULL, &globalSize, &localSize, 0, NULL, NULL);

这意味着您要尝试使内核的全局大小为16,本地大小为128.几乎可以肯定这不是您想要的.请记住,全局大小是您要运行的工作项的总数,而 local size 是每个工作组的大小.例如,如果全局大小为1024x1024,本地大小为16x16,则将有4096个工作组,每个工作组包含256个工作项.这可能有效或无效,具体取决于您的计算设备.

This means you are trying to enqueue your kernel with a global size of 16, and a local size of 128. That's almost certainly not what you want. Remember, global size is the total number of work items you want to run, and the local size is the size of each workgroup. For example, if you have a global size of 1024x1024, and a local size of 16x16, you would have 4096 workgroups of 256 work items each. This may or may not be valid, depending on your compute device.

关于传递NULL本地大小:CL规范说,如果您这样做,则CL实现可以选择任何所需的作为本地工作组大小.理想情况下,它将尝试为您做一些聪明的事情,但您无法保证.

With regards to passing a NULL local size: the CL spec says that if you do that, the CL implementation can choose whatever it wants as the local workgroup size. Ideally, it will try to do something clever on your behalf, but you have no guarantees.

这篇关于CL_INVALID_WORK_GROUP_SIZE错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆