CUDA共享内存问题在输出取决于extern声明和数组的大小 [英] CUDA shared memory issue in outputs depending on extern declaration and size of array
问题描述
如果我在CUDA中尝试共享内存,但我不明白它在这一段代码中的行为。
我有一个非常基本的内核:
If I am experimenting with shared memory in CUDA and I do not understand its behaviour in this bit of code. I have a pretty basic kernel:
__global__ void sum( int* input, int* output, int size){
int tid = threadIdx.x+blockDim.x*blockIdx.x +
blockDim.x*gridDim.x*blockIdx.y;
extern __shared__ int sdata[];
sdata[tid] = input[tid];
__syncthreads();
output[tid] = input[tid];
}
并且所有输出为0 output []
。但是,如果我注释掉 sdata [tid] = input [tid];
,那么输出就是等于 input []
。
And the output is 0 for all output[]
. However, if I comment out sdata[tid] = input[tid];
, then the output is fine and equal input[]
.
我在这里做错了什么?我错过了什么?
What am I doing wrong here? Am I missing something?
[UPDATE]
好吧,如果我删除 extern
并给一个大小的共享数组,它似乎工作正常。任何想法为什么?
Well, if I remove the tag extern
and give a size to the shared array, it seems to work fine. Any ideas why?
[UPDATE]
我调用内核的方式是从c ++代码,所以我需要包装它从主代码。
[UPDATE] The way that I am invoking the kernel is from c++ code, so I needed to wrap it to be invoked from the main code.
kernel.cu
包含内核本身以及封装函数:
kernel.cu
contains the kernel itself plus the wrapper function:
void wrapper(int dBlock, int dThread, int* input, int* output, int size){
sum<<<dBlock,dThread>>>(input, output, size);
}
callerfunction.cpp
包含c ++代码和调用包装器的函数。
callerfunction.cpp
contains c++ code and the function that invokes the wrapper.
推荐答案
如果使用extern限定符,启动内核时共享内存的大小。
kernel<<块,线程,大小>>>(...)
size参数是以字节为单位的共享内存大小。
If you use the extern qualifier you need to pass the size of the shared memory when launching the kernel.
kernel<<< blocks, threads, size>>>(...)
The size parameter is the size of shared memory in Bytes.
这篇关于CUDA共享内存问题在输出取决于extern声明和数组的大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!