在共享内存中创建数组w / o模板，如PyOpenCL [英] Create arrays in shared memory w/o templates like in PyOpenCL

查看：321 发布时间：2017/3/5 19:33:36 cuda pycuda

本文介绍了在共享内存中创建数组w / o模板，如PyOpenCL的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何在共享内存中创建数组，而无需使用官方中的模板修改内核示例。或者是使用模板的官方方式？

How can I create an array in shared memory without modifying the kernel using templates as seen in the official examples. Or is using templates the official way?

在PyOpenCL中，我可以通过设置内核参数在本地内存中创建一个数组

In PyOpenCL I can create an array in local memory with setting a kernel argument

kernel.set_arg(1,numpy.uint32(a_width))

... 
KERNEL_CODE = """
__kernel void matrixMul(__local float* A_temp,...)
    { ...} """

推荐答案

CUDA支持在内核运行时的动态共享内存分配，但是机制与OpenCL有点不同。在CUDA运行时API中，使用动态分配/大小的共享内存和启动内存大小的内核使用以下语法：

CUDA supports dynamic shared memory allocation at kernel run time, but the mechanism is a bit different to OpenCL. In the CUDA runtime API, a kernel using dynamically allocated/sized shared memory and the launch to size the memory uses the following syntax:

__global__ void kernel(...)
{
    extern __shared__ typename buffer[];

    ....
}
....
kernel <<< griddim, blockdim, sharedmem, streamID >>> (...)

其中 sharedmem

在PyCUDA中，同样的机制类似这样：

In PyCUDA, the same mechanism works something like this:

mod = SourceModule("""
    __global__ void kernel(...)
    {
        extern __shared__ typename buffer[];

        ....
    }
  """)

func = mod.get_function("kernel")
func.prepare(..., shared=sharedmem)
func.prepared_call(griddim,blockdim,...)

使用传递给 prepare 方法的共享内存分配大小。

with the shared memory allocation size passed to the prepare method.

这篇关于在共享内存中创建数组w / o模板，如PyOpenCL的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在共享内存中创建数组w / o模板，如PyOpenCL [英] Create arrays in shared memory w/o templates like in PyOpenCL

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

在共享内存中创建数组w / o模板，如PyOpenCL [英] Create arrays in shared memory w/o templates like in PyOpenCL

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭