用OpenCL记忆()缓存GPU缓冲区的最快方法是什么? [英] What is the fastest way to memset() a GPU buffer with OpenCL?
问题描述
我正在使用OpenCL,我需要memset()
在全局设备内存中放置一些数组. CUDA具有类似memset()
的API函数,但OpenCL没有.我阅读了此,在其中我找到了两种可能的选择:
I'm using OpenCL, and I need to memset()
some array in global device memory. CUDA has a memset()
-like API function, but OpenCL does not. I read this, where I found two possible alternatives:
- 在主机上使用
memset()
并使用一些暂存缓冲区,然后clEnqueueWriteBuffer()
将其复制到设备上的缓冲区. -
排队以下内核:
- using
memset()
on the host with some scratch buffer, thenclEnqueueWriteBuffer()
to copy that to the buffer on the device. Enqueueing the following kernel:
__kernel void memset_uint4(
__global uint4* mem,
__private uint4 val)
{
mem[get_global_id(0)] = val;
}
哪个更好?或者更确切地说,在哪种情况下/哪种平台比另一种更好?
Which is better? Or rather, under which circumstances/for which platforms is one better than the other?
注意:如果零记忆的特殊情况值得特殊对待,那也很好.
Note: If the special case of zero'ing memory merits special treatment, that would be nice to know too.
推荐答案
您可以从OpenCL v1.2使用clEnqueueFillBuffer().那正是您所需要的.而且在如何使用模式填充缓冲区方面非常灵活.
You can use clEnqueueFillBuffer() from OpenCL v1.2. That is exactly what you need. And it is very flexible on how to fill the buffer with patterns.
这是文档页面:
http://www.khronos.org/registry/cl /sdk/1.2/docs/man/xhtml/clEnqueueFillBuffer.html
如果您使用的是1.1或更低版本....那么您应该使用其他方法.
If you are on 1.1 or below.... then you should recur to other approaches.
这篇关于用OpenCL记忆()缓存GPU缓冲区的最快方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!