CUDA:在结构内分配结构数组 [英] CUDA: allocation of an array of structs inside a struct

查看:21
本文介绍了CUDA:在结构内分配结构数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这些结构:

typedef struct neuron
{
float*  weights;
int n_weights;
}Neuron;


typedef struct neurallayer
{
Neuron *neurons;
int    n_neurons;
int    act_function;
}NLayer;

NLayer"结构可以包含任意数量的神经元"

"NLayer" struct can contain an arbitrary number of "Neuron"

我尝试以这种方式从主机分配一个带有 5 个神经元"的NLayer"结构:

I've tried to allocate a 'NLayer' struct with 5 'Neurons' from the host in this way:

NLayer* nL;
int i;
int tmp=9;
cudaMalloc((void**)&nL,sizeof(NLayer));
cudaMalloc((void**)&nL->neurons,6*sizeof(Neuron));
for(i=0;i<5;i++)
    cudaMemcpy(&nL->neurons[i].n_weights,&tmp,sizeof(int),cudaMemcpyHostToDevice);

...然后我尝试使用该内核修改nL->neurons[0].n_weights"变量:

...then I've tried to modify the "nL->neurons[0].n_weights" variable with that kernel:

__global__ void test(NLayer* n)
           {
              n->neurons[0].n_weights=121;
           }

但在编译时 nvcc 返回与内核唯一行相关的警告":

but at compiling time nvcc returns that "warning" related to the only line of the kernel:

Warning: Cannot tell what pointer points to, assuming global memory space

当内核完成其工作时,结构开始无法访问.

and when the kernel finish its work the struct begin unreachable.

很可能是我在分配过程中做错了什么......有人可以帮助我吗?非常感谢,对不起我的英语!:)

It's very probably that I'm doing something wrong during the allocation....can someone helps me?? Thanks very much, and sorry for my english! :)

更新:

感谢 aland,我修改了创建此函数的代码,该函数应分配结构NLayer"的实例:

Thanks to aland I've modified my code creating this function that should allocate an instance of the struct "NLayer":

NLayer* setNLayer(int numNeurons,int weightsPerNeuron,int act_fun)
{
    int i;
    NLayer  h_layer;
    NLayer* d_layer;
    float*  d_weights;

    //SET THE LAYER VARIABLE OF THE HOST NLAYER
    h_layer.act_function=act_fun;
    h_layer.n_neurons=numNeurons;
    //ALLOCATING THE DEVICE NLAYER
    if(cudaMalloc((void**)&d_layer,sizeof(NLayer))!=cudaSuccess)
        puts("ERROR: Unable to allocate the Layer");
    //ALLOCATING THE NEURONS ON THE DEVICE
    if(cudaMalloc((void**)&h_layer.neurons,numNeurons*sizeof(Neuron))!=cudaSuccess)
        puts("ERROR: Unable to allocate the Neurons of the Layer");
    //COPING THE HOST NLAYER ON THE DEVICE
    if(cudaMemcpy(d_layer,&h_layer,sizeof(NLayer),cudaMemcpyHostToDevice)!=cudaSuccess)
                puts("ERROR: Unable to copy the data layer onto the device");

    for(i=0;i<numNeurons;i++)
    {
        //ALLOCATING THE WEIGHTS' ARRAY ON THE DEVICE
        cudaMalloc((void**)&d_weights,weightsPerNeuron*sizeof(float));
        //COPING ITS POINTER AS PART OF THE i-TH NEURONS STRUCT
        if(cudaMemcpy(&d_layer->neurons[i].weights,&d_weights,sizeof(float*),cudaMemcpyHostToDevice)!=cudaSuccess)
                puts("Error: unable to copy weights' pointer to the device");
    }


    //RETURN THE DEVICE POINTER
    return d_layer;
}

我以这种方式从主函数调用该函数(内核测试"之前已声明):

and i call that function from the main in that way (the kernel "test" is previously declared):

int main()
{
    NLayer* nL;
    int h_tmp1;
    float h_tmp2;

    nL=setNLayer(10,12,13);
    test<<<1,1>>>(nL);
    if(cudaMemcpy(&h_tmp1,&nL->neurons[0].n_weights,sizeof(float),cudaMemcpyDeviceToHost)!=cudaSuccess);
        puts("ERROR!!");
    printf("RESULT:%d",h_tmp1);

}

当我编译该代码时,编译器会向我显示警告,当我执行程序时它会在屏幕上打印:

When I compile that code the compiler show me the Warning, and when I execute the program it print on screen:

Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
ERROR!!
RESULT:1

如果我注释内核调用,则不会比较最后一个错误.

The last error doesn't not compare if I comment the kernel call.

我哪里错了?我不知道该怎么办感谢您的帮助!

Where I'm wrong? I do not know how to do Thanks for your help!

推荐答案

问题出在这里:

cudaMalloc((void**)&nL,sizeof(NLayer));
cudaMalloc((void**)&nL->neurons,6*sizeof(Neuron));

在第一行,nL 指向设备全局内存中的结构.因此,在第二行中,cudaMalloc 的第一个参数是驻留在 GPU 上的地址,这是未定义的行为(在我的测试系统上,它会导致段错误;但在你的情况下,还有一些更微妙的东西).

In first line, nL is pointing to structure in global memory on device. Therefore, in second line the first argument to cudaMalloc is address residing on GPU, which is undefined behaviour (on my test system, it causes segfault; in your case, though, there is something more subtle).

做你想做的事情的正确方法是首先在主机内存中创建结构,用数据填充它,然后将其复制到设备,如下所示:

The correct way to do what you want is first to create structure in host memory, fill it with data, and then copy it to device, like this:

NLayer* nL;
NLayer h_nL;
int i;
int tmp=9;
// Allocate data on device
cudaMalloc((void**)&nL, sizeof(NLayer));
cudaMalloc((void**)&h_nL.neurons, 6*sizeof(Neuron));
// Copy nlayer with pointers to device
cudaMemcpy(nL, &h_nL, sizeof(NLayer), cudaMemcpyHostToDevice);

此外,不要忘记始终检查 CUDA 例程中的任何错误.

Also, don't forget to always check for any errors from CUDA routines.

更新

在您的代码的第二个版本中:

In second version of your code:

cudaMemcpy(&d_layer->neurons[i].weights,&d_weights,...) --- 再次,您正在取消引用设备指针 (d_layer)在主机上.相反,您应该使用

cudaMemcpy(&d_layer->neurons[i].weights,&d_weights,...) --- again, you are dereferencing device pointer (d_layer) on host. Instead, you should use

cudaMemcpy(&h_layer.neurons[i].weights,&d_weights,sizeof(float*),cudaMemcpyHostToDevice

这里你取h_layer(主机结构),读取它的元素(h_layer.neurons),它是指向设备内存的指针.然后你对其进行一些指针运算(&h_layer.neurons[i].weights).计算此地址无需访问设备内存.

Here you take h_layer (host structure), read its element (h_layer.neurons), which is pointer to device memory. Then you do some pointer arithmetics on it (&h_layer.neurons[i].weights). No access to device memory is needed to compute this address.

这篇关于CUDA:在结构内分配结构数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆