CUDA共享内存问题在输出取决于extern声明和数组的大小 [英] CUDA shared memory issue in outputs depending on extern declaration and size of array

查看：152 发布时间：2017/3/5 19:14:05 cuda

本文介绍了CUDA共享内存问题在输出取决于extern声明和数组的大小的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如果我在CUDA中尝试共享内存，但我不明白它在这一段代码中的行为。
我有一个非常基本的内核：

If I am experimenting with shared memory in CUDA and I do not understand its behaviour in this bit of code. I have a pretty basic kernel:

__global__ void sum( int* input, int* output, int size){


  int tid = threadIdx.x+blockDim.x*blockIdx.x +
    blockDim.x*gridDim.x*blockIdx.y;

  extern  __shared__ int sdata[];

  sdata[tid] = input[tid];
  __syncthreads();

  output[tid] = input[tid];

}

并且所有输出为0 output [] 。但是，如果我注释掉 sdata [tid] = input [tid]; ，那么输出就是等于 input [] 。

And the output is 0 for all output[]. However, if I comment out sdata[tid] = input[tid];, then the output is fine and equal input[].

我在这里做错了什么？我错过了什么？

What am I doing wrong here? Am I missing something?

[UPDATE]

好吧，如果我删除 extern 并给一个大小的共享数组，它似乎工作正常。任何想法为什么？

Well, if I remove the tag extern and give a size to the shared array, it seems to work fine. Any ideas why?

[UPDATE]
我调用内核的方式是从c ++代码，所以我需要包装它从主代码。

[UPDATE] The way that I am invoking the kernel is from c++ code, so I needed to wrap it to be invoked from the main code.

kernel.cu 包含内核本身以及封装函数：

kernel.cu contains the kernel itself plus the wrapper function:

void wrapper(int dBlock, int dThread, int* input, int* output, int size){

    sum<<<dBlock,dThread>>>(input, output, size);

}

callerfunction.cpp 包含c ++代码和调用包装器的函数。

callerfunction.cpp contains c++ code and the function that invokes the wrapper.

CUDA共享内存问题在输出取决于extern声明和数组的大小 [英] CUDA shared memory issue in outputs depending on extern declaration and size of array

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录关闭

CUDA共享内存问题在输出取决于extern声明和数组的大小 [英] CUDA shared memory issue in outputs depending on extern declaration and size of array

问题描述

推荐答案

相关文章

其它硬件开发最新文章

热门教程

热门工具

登录 关闭

登录关闭