如何在pyopencl中创建大小可变的__local内存? [英] How to create variable sized __local memory in pyopencl?
问题描述
我使用clSetKernelArg
创建可变大小" __local
内存供我的内核使用,而OpenCL本身不可用.看看我的例子:
in my C OpenCL code I use clSetKernelArg
to create 'variable size' __local
memory for use in my kernels, which is not available in OpenCL per se. See my example:
clSetKernelArg(clKernel, ArgCounter++, sizeof(cl_mem), (void *)&d_B);
...
clSetKernelArg(clKernel, ArgCounter++, sizeof(float)*block_size*block_size, NULL);
...
kernel="
matrixMul(__global float* C,
...
__local float* A_temp,
...
)"
{...
我的问题是,现在如何在pyopencl中执行相同的操作?
My question is now, how to do the same in pyopencl?
我浏览了pyopencl附带的示例,但是我唯一能找到的是使用模板的方法,在我看来,这似乎太过分了.参见示例.
I looked through the examples that come with pyopencl, but the only thing I could find was an approach using templates, which seems as to me as I understood it like an overkill. See example.
kernel = """
__kernel void matrixMul(__global float* C,...){
...
__local float A_temp[ %(mem_size) ];
...
}
您推荐什么?
推荐答案
它与C相似.您将一个固定大小的数组作为本地传递给它.这是来自Enja的基数排序的示例.请注意,最后一个参数是本地内存数组.
It is similar to C. You pass it a fixed size array as a local. Here is an example from Enja's radix sort. Notice the last argument is a local memory array.
def naive_scan(self, num):
nhist = num/2/self.cta_size*16
global_size = (nhist,)
local_size = (nhist,)
extra_space = nhist / 16 #NUM_BANKS defined as 16 in RadixSort.cpp
shared_mem_size = self.uintsz * (nhist + extra_space)
scan_args = ( self.mCountersSum,
self.mCounters,
np.uint32(nhist),
cl.LocalMemory(2*shared_mem_size)
)
self.radix_prg.scanNaive(self.queue, global_size, local_size, *(scan_args)).wait()
这篇关于如何在pyopencl中创建大小可变的__local内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!