OpenCL的从GPU读取可变大小结果缓冲区 [英] OpenCL read variable size result buffer from the GPU

查看:779
本文介绍了OpenCL的从GPU读取可变大小结果缓冲区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个搜索的OpenCL 1.1算法与数据量小行之有效的:

I have one searching OpenCL 1.1 algorithm which works well with small amount of data:

1)打造的 inputData 数组,并将其传递到GPU

1.) build the inputData array and pass it to the GPU

2)创建一个非常大的 resultData 容器(如200000 * sizeof的(cl_uint)),并通过过这个

2.) create a very big resultData container (e.g. 200000 * sizeof (cl_uint) ) and pass this one too

3)创建 resultSize 容器(inited零),它可通过原子操作访问的(至少我想这)

3.) create the resultSize container (inited to zero) which can be access via atomic operation (at least I suppose this)

在我的一名工人有了一个结果是将复制到在 resultData 缓冲在原子INC操作的增量在 resultSize (直到缓冲区已满)。

When one of my workers has a result it copies that into the the resultData buffer and increments the resultSize in an atomic inc operation (until the buffer is full).

让我写一个code例子(OpenCL的code):

Let me write a code example (opencl code):

lastPosition = atomic_add(resultBufferSize, 5);
while (lastPosition > RESULT_BUFFER_SIZE)
{
    lastPosition = atomic_add(resultBufferSize, 5);
}

和主机端我读缓冲区,并设置 resultBufferSize 以零:

And on the host side I read the buffer and set resultBufferSize to zero:

resultBufferSize = 0;
oclErr |= clEnqueueWriteBuffer(gpuAcces.getCqCommandQueue(), cm_resultBufferSize,  CL_TRUE, 0,  sizeof(cl_uint), (void*)&resultBufferSize, 0, NULL, NULL);

现在我的问题是:

我拥有比resultData得多的结果可以存储。反正我不知道结果(例如我多少条路径可以找到)的大小想法。

I have much more results than the resultData can store. And anyway I have no idea about the size of the result (e.g. how many paths I can find).

我的想法:

我不时会空(或方法)在主机侧的容器和重置在 resultSize 当缓冲区已满,工人将等待一个的,而的循环。

time to time I would empty ( or process) the container on the host side and reset the resultSize when the buffer is full and the workers would wait in a while loop.

我喜欢这个想法,因为我可以处理该主机上的数据并行的了。

I liked this idea because I can process the data parallel on the host too.

但我没能实现任何解决方案还为这个:

But I was not able to implement any solution yet for this:

1)NVIDIA不能与无尽的工作,同时,至少我不能使用它。当我尝试使用无限循环卡坠毁。

1.) NVIDIA cannot work with endless while or at least I cannot use it. When I try use endless loop the card crashed.

2)屏障()ANF mem_fence()可以管理同步的问题,但没有这一项。

2.) barrier() anf mem_fence() can manage sync issue but not this one

你有什么想法强劲如何,我可以(在搜索过程中的问题如)手柄无法修复的结果的大小?我几乎pretty一定要有一个很好的模式,但我找不到它。

Do you have any robust idea how I can handle not fix result sizes (e.g. during searching problems)? I almost pretty sure there must be a good patterns but I cannot find it.

有没有NVIDIA的OpenCL的睡眠?因为我会把它陷入了无尽的循环也许这可以帮助我一点

Is there any sleep in NVIDIA opencl? Because I would put it into the endless loop maybe this can help a bit me.

我猜变量的结果是一个老问题,必须有良好的模式。
我曾在我先前的职位类似的问题(但背景是不同的)。

I guess the variable result is an old issue and there must be good patterns. I had a similar issue in my earlier post (but the context was different).

推荐答案

您还没有清楚地表明,你正在使用Windows操作系统作为,但我认为它,因为你有你的问题VS2013标签。

You have not clearly indicated that you are using Windows as OS but I assume it since you have the VS2013 tag in your question.

NVIDIA显卡不会崩溃。在Windows上你有<一个href=\"http://http.developer.nvidia.com/NsightVisualStudio/2.2/Documentation/UserGuide/HTML/Content/Timeout_Detection_Recovery.htm\"相对=nofollow>超时检测和放大器;恢复(TDR)的WDDM驱动程序重新启动它的驱动GPU,如果他们不响应。您可以轻松地禁用这个功能与Nsight。但是,请注意,这可能会导致您的桌面环境的问题,所以一定要编写内核,将在一段时间耐受量结束。然后,可以使用Nvidia的OpenCL实现,甚至在Windows上运行的很长的内核。

The Nvidia card does not crash. On Windows you have Timeout Detection & Recovery (TDR) in the WDDM driver which restarts GPU drivers if they become unresponsive. You can disable this "feature" with Nsight easily. However, be aware that this may cause problems with your desktop environment, so make sure to write a kernel that will end in a tolerable amount of time. Then you can run your very long kernels even on Windows with Nvidias OpenCL implementation.

这篇关于OpenCL的从GPU读取可变大小结果缓冲区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆