在PyCUDA上共享内存入门 [英] Getting started with shared memory on PyCUDA

查看:51
本文介绍了在PyCUDA上共享内存入门的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过玩以下代码来了解共享内存:

I'm trying to understand shared memory by playing with the following code:

import pycuda.driver as drv
import pycuda.tools
import pycuda.autoinit
import numpy
from pycuda.compiler import SourceModule

src='''
__global__ void reduce0(float *g_idata, float *g_odata) {
extern __shared__ float sdata[];
// each thread loads one element from global to shared mem
unsigned int tid = threadIdx.x;
unsigned int i = blockIdx.x*blockDim.x + threadIdx.x;

sdata[tid] = g_idata[i];
__syncthreads();
// do reduction in shared mem
for(unsigned int s=1; s < blockDim.x; s *= 2) {
   if (tid % (2*s) == 0) {
      sdata[tid] += sdata[tid + s];
   }
__syncthreads();
}
// write result for this block to global mem
if (tid == 0) g_odata[blockIdx.x] = sdata[0];
}
'''

mod = SourceModule(src)
reduce0=mod.get_function('reduce0')

a = numpy.random.randn(400).astype(numpy.float32)

dest = numpy.zeros_like(a)
reduce0(drv.In(a),drv.Out(dest),block=(400,1,1))

我看不到任何明显的错误,但是我不断收到同步错误,并且它没有运行.

I can't see anything obviously wrong with this, but I keep getting synchronization errors and it doesn't run.

任何帮助都将不胜感激.

Any help greatly appreciated.

推荐答案

当您指定

extern __shared__ float sdata[];

您正在告诉编译器调用者将提供共享内存.在PyCUDA中,这是通过在调用CUDA函数的行上指定 shared = nnnn 来完成的.就您而言,类似:

you are telling the compiler that the caller will provide the shared memory. In PyCUDA, that is done by specifying shared=nnnn on the line that calls the CUDA function. In your case, something like:

reduce0(drv.In(a),drv.Out(dest),block=(400,1,1),shared=4*400)

或者,您可以删除extern关键字,然后直接指定共享内存:

Alternately, you can drop the extern keyword, and specify the shared memory directly:

__shared__ float sdata[400];

这篇关于在PyCUDA上共享内存入门的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆