Cuda重用共享内存变量名 [英] Cuda reuse of shared memory variable name

查看:289
本文介绍了Cuda重用共享内存变量名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个cuda内核运行,一个接一个:

I have two cuda kernels that are run, one after the other:

__global__
void calculate_histo(const float* const d_logLuminance,
        unsigned int* d_histogram,
        float min_logLum,
        float lumRange,
        int numBins,
        int num_elements){
    extern __shared__ float sdata[];
    int tid = threadIdx.x;
    int bid = blockIdx.x;
    int gid = tid * blockDim.x + bid;

    // load input into __shared__ memory
    if(gid < num_elements)
    {
        sdata[tid] = d_logLuminance[gid];
        __syncthreads();

        //compute bin value of input
        int bin = static_cast <int> (floor((d_logLuminance[gid]-min_logLum)/ lumRange * numBins));
        //increment histogram at bin value
        atomicAdd(&(d_histogram[bin]), 1);
    }
}

__global__
void blelloch_scan(unsigned int* const d_cdf, unsigned int* d_histogram, int numBins) {
    extern __shared__ unsigned int sdata[];// allocated on invocation
    int thid = threadIdx.x;
    //printf("%i \n", thid);
    //printf("%i \n", d_histogram[thid]);

    int offset = 1;


    sdata[2*thid] = d_histogram[2*thid]; // load input into shared memory
    sdata[2*thid+1] = d_histogram[2*thid+1];

    // build sum in place up the tree
    for (int d = numBins>>1; d > 0; d >>= 1) {
        __syncthreads();
        if (thid < d) {
            int ai = offset*(2*thid+1)-1;
            int bi = offset*(2*thid+2)-1;
            sdata[bi] += sdata[ai];
        }
        offset *= 2;
    }
    if (thid == 0) { sdata[numBins - 1] = 0; } // clear the last element
    // traverse down tree & build scan
    for (int d = 1; d < numBins; d *= 2) {
        offset >>= 1;
        __syncthreads();
        if (thid < d) {
            int ai = offset*(2*thid+1)-1;
            int bi = offset*(2*thid+2)-1;
            float t = sdata[ai];
            sdata[ai] = sdata[bi];
            sdata[bi] += t;
        }
        __syncthreads();
        d_cdf[2*thid] = sdata[2*thid]; // write results to device memory
        d_cdf[2*thid+1] = sdata[2*thid+1];
    }

}

他们都使用共享内存。第二个有一个unsigned int数组作为共享内存。第一个有一个float数组。我认为我应该能够重用相同的变量名称,sdata,两个数组,因为共享内存在每个内核启动后清除,但我得到的错误:

They both use shared memory. The second has an unsigned int array as the shared memory. The first has a float array. I thought I should be able to reuse the same variable name, sdata, for both arrays, since shared memory is cleared after each kernel launch, but I'm getting the error:

declaration is incompatible with previous 'sdata'

为每个内核使用不同的变量名,这似乎解决了问题。任何人知道为什么我不能重复使用相同的变量名称?

If I use different variable names for each kernel, that seems to solve the problem. Anyone know why I can't reuse the same variable name?

推荐答案

CUDA只是遵循标准C语言的规则。引用Kernighan和RitchieC编程语言一书:

CUDA is just following the rule of the standard C language. Quoting the Kernighan and Ritchie "The C Programming Language" book:


外部变量必须在任何函数之外定义一次,这预留了它的存储。变量也必须在每个要访问它的函数中声明;这表示变量的类型。 [...]定义是指创建或分配变量的位置; 是指变量的性质被声明但未分配存储空间的地方。

An external variable must be defined, exactly once, outside of any function; this sets aside storage for it. The variable must also be declared in each function that wants to access it; this states the type of the variable. [...] Definition refers to the place where the variable is created or assigned storage; declaration refers to places where the nature of the variable is stated but no storage is allocated.

你的程序应该有类似

extern __shared__ unsigned int sdata[];

在该位置,您将创建 $ c> sdata 到 unsigned int 。在 __ global __ 函数中,您声明 sdata 的类型, c $ c> __ global __ 函数可以知道它。在

At that location, you are creating a pointer, named sdata, to an unsigned int. Inside the __global__ functions you are declaring the type of sdata, so that the __global__ function can be aware of it. In the

kernel<<<blocks,threads,numbytes_for_shared>>>(...);

启动,您正在分配 sdata

launch, you are allocating space of the array pointed to by sdata.

这篇关于Cuda重用共享内存变量名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆