CUDA:在struct内部结构数组的分配 [英] CUDA: allocation of an array of structs inside a struct

查看:185
本文介绍了CUDA:在struct内部结构数组的分配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这些结构:

  typedef struct neuron 
{
float *
int n_weights;
} Neuron;


typedef struct neurallayer
{
神经元*神经元;
int n_neurons;
int act_function;
} NLayer;

NLayerstruct可以包含任意数量的Neuron



我试图以这种方式从主机中分配一个具有5'神经元'的'NLayer'结构:

  NLayer * nL; 
int i;
int tmp = 9;
cudaMalloc((void **)& nL,sizeof(NLayer));
cudaMalloc((void **)& nL->神经元,6 * sizeof(Neuron));
for(i = 0; i <5; i ++)
cudaMemcpy(& nL-> neurons [i] .n_weights,& tmp,sizeof(int),cudaMemcpyHostToDevice);

...然后我试图修改nL->神经元[0] .n_weights 变量与该内核:

  __ global__ void test(NLayer * n)
{
n-> ;神经元[0] .n_weights = 121;
}

但在编译时nvcc返回warning内核:

 警告:假设全局内存空间



当内核完成工作时,struct开始不可达。



我在做错事在分配....有人可以帮助我吗?
非常感谢,对不起我的英语! :)



更新:



感谢aland我修改了我的代码创建应该分配结构NLayer的实例的此函数:

  NLayer * setNLayer(int numNeurons,int weightsPerNeuron,int act_fun)
{
int i;
NLayer h_layer;
NLayer * d_layer;
float * d_weights;

//设置主机层的可变层
h_layer.act_function = act_fun;
h_layer.n_neurons = numNeurons;
//分配设备的NLAYER
if(cudaMalloc((void **)& d_layer,sizeof(NLayer))!= cudaSuccess)
puts(ERROR: );
//分配设备上的神经
if(cudaMalloc((void **)& h_layer.neurons,numNeurons * sizeof(Neuron))!= cudaSuccess)
puts :无法分配层的神经元);
//在设备上复制主机NLAYER
if(cudaMemcpy(d_layer,& h_layer,sizeof(NLayer),cudaMemcpyHostToDevice)!= cudaSuccess)
puts(ERROR:Unable to copy数据层到设备上);

for(i = 0; i {
//在设备上分配权重的数组
cudaMalloc((void **) & d_weights,weightsPerNeuron * sizeof(float));
//将其作为i-TH神经元结构的一部分指向
if(cudaMemcpy(& d_layer-> neurons [i] .weights,& d_weights,sizeof(float *),cudaMemcpyHostToDevice )!= cudaSuccess)
puts(错误:无法复制权重指针到设备);
}


//返回设备指针
return d_layer;
}

并且我以这种方式从main调用该函数之前已声明):

  int main()
{
NLayer * nL;
int h_tmp1;
float h_tmp2;

nL = setNLayer(10,12,13);
test<<< 1,1>>>(nL);
if(cudaMemcpy(& h_tmp1,& nL-> neurons [0] .n_weights,sizeof(float),cudaMemcpyDeviceToHost)!= cudaSuccess);
puts(ERROR !!);
printf(RESULT:%d,h_tmp1);

}



当编译该代码时,编译器会显示警告,当我执行程序时,它在屏幕上打印:

 错误:无法复制权重指针到设备
错误:无法复制权重指针到设备
错误:无法复制权重指针到设备
错误:无法复制权重指针到设备
错误:无法复制设备的权重指针
错误:无法复制设备的权重指针
错误:无法复制设备的权重指针
错误:无法复制权重指针到设备设备
错误:无法复制权重指针到设备
错误:无法复制权重的指针到设备
错误!
结果:1

最后一个错误不会比较如果我注释内核调用。



我错了什么?
我不知道如何做
感谢您的帮助!

解决方案

问题在这里:

  cudaMalloc((void **)& nL,sizeof(NLayer)); 
cudaMalloc((void **)& nL->神经元,6 * sizeof(Neuron));

在第一行中, nL 结构在设备上的全局内存中。
因此,在第二行, cudaMalloc 的第一个参数是驻留在GPU上的地址,这是未定义的行为(在我的测试系统上,它导致segfault; ,但有一些更微妙的东西。)



正确的方式来做你想要的是首先在主机内存中创建结构,填充数据,然后复制it to device,like this:

  NLayer * nL; 
NLayer h_nL;
int i;
int tmp = 9;
//在设备上分配数据
cudaMalloc((void **)& nL,sizeof(NLayer));
cudaMalloc((void **)& h_nL.neurons,6 * sizeof(Neuron));
//复制带指针的nlayer到设备
cudaMemcpy(nL,& h_nL,sizeof(NLayer),cudaMemcpyHostToDevice);

此外,不要忘记始终检查CUDA例程中的任何错误。



UPDATE



在第二个版本的代码中:



cudaMemcpy(& d_layer-> neurons [i] .weights,& d_weights,...) ---再次,您是dereferencing设备指针( d_layer )。您应该使用

  cudaMemcpy(& h_layer.neurons [i] .weights,& d_weights,sizeof ),cudaMemcpyHostToDevice 

这里你可以使用 h_layer 主机结构),读取它的元素( h_layer.neurons ),它是指向设备内存的指针,然后在它上面做一些指针算术(& h_layer.neurons [i] .weights )。无需访问设备内存来计算此地址。


I've these structs:

typedef struct neuron
{
float*  weights;
int n_weights;
}Neuron;


typedef struct neurallayer
{
Neuron *neurons;
int    n_neurons;
int    act_function;
}NLayer;

"NLayer" struct can contain an arbitrary number of "Neuron"

I've tried to allocate a 'NLayer' struct with 5 'Neurons' from the host in this way:

NLayer* nL;
int i;
int tmp=9;
cudaMalloc((void**)&nL,sizeof(NLayer));
cudaMalloc((void**)&nL->neurons,6*sizeof(Neuron));
for(i=0;i<5;i++)
    cudaMemcpy(&nL->neurons[i].n_weights,&tmp,sizeof(int),cudaMemcpyHostToDevice);

...then I've tried to modify the "nL->neurons[0].n_weights" variable with that kernel:

__global__ void test(NLayer* n)
           {
              n->neurons[0].n_weights=121;
           }

but at compiling time nvcc returns that "warning" related to the only line of the kernel:

Warning: Cannot tell what pointer points to, assuming global memory space

and when the kernel finish its work the struct begin unreachable.

It's very probably that I'm doing something wrong during the allocation....can someone helps me?? Thanks very much, and sorry for my english! :)

UPDATE:

Thanks to aland I've modified my code creating this function that should allocate an instance of the struct "NLayer":

NLayer* setNLayer(int numNeurons,int weightsPerNeuron,int act_fun)
{
    int i;
    NLayer  h_layer;
    NLayer* d_layer;
    float*  d_weights;

    //SET THE LAYER VARIABLE OF THE HOST NLAYER
    h_layer.act_function=act_fun;
    h_layer.n_neurons=numNeurons;
    //ALLOCATING THE DEVICE NLAYER
    if(cudaMalloc((void**)&d_layer,sizeof(NLayer))!=cudaSuccess)
        puts("ERROR: Unable to allocate the Layer");
    //ALLOCATING THE NEURONS ON THE DEVICE
    if(cudaMalloc((void**)&h_layer.neurons,numNeurons*sizeof(Neuron))!=cudaSuccess)
        puts("ERROR: Unable to allocate the Neurons of the Layer");
    //COPING THE HOST NLAYER ON THE DEVICE
    if(cudaMemcpy(d_layer,&h_layer,sizeof(NLayer),cudaMemcpyHostToDevice)!=cudaSuccess)
                puts("ERROR: Unable to copy the data layer onto the device");

    for(i=0;i<numNeurons;i++)
    {
        //ALLOCATING THE WEIGHTS' ARRAY ON THE DEVICE
        cudaMalloc((void**)&d_weights,weightsPerNeuron*sizeof(float));
        //COPING ITS POINTER AS PART OF THE i-TH NEURONS STRUCT
        if(cudaMemcpy(&d_layer->neurons[i].weights,&d_weights,sizeof(float*),cudaMemcpyHostToDevice)!=cudaSuccess)
                puts("Error: unable to copy weights' pointer to the device");
    }


    //RETURN THE DEVICE POINTER
    return d_layer;
}

and i call that function from the main in that way (the kernel "test" is previously declared):

int main()
{
    NLayer* nL;
    int h_tmp1;
    float h_tmp2;

    nL=setNLayer(10,12,13);
    test<<<1,1>>>(nL);
    if(cudaMemcpy(&h_tmp1,&nL->neurons[0].n_weights,sizeof(float),cudaMemcpyDeviceToHost)!=cudaSuccess);
        puts("ERROR!!");
    printf("RESULT:%d",h_tmp1);

}

When I compile that code the compiler show me the Warning, and when I execute the program it print on screen:

Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
Error: unable to copy weights' pointer to the device
ERROR!!
RESULT:1

The last error doesn't not compare if I comment the kernel call.

Where I'm wrong? I do not know how to do Thanks for your help!

解决方案

The problem is here:

cudaMalloc((void**)&nL,sizeof(NLayer));
cudaMalloc((void**)&nL->neurons,6*sizeof(Neuron));

In first line, nL is pointing to structure in global memory on device. Therefore, in second line the first argument to cudaMalloc is address residing on GPU, which is undefined behaviour (on my test system, it causes segfault; in your case, though, there is something more subtle).

The correct way to do what you want is first to create structure in host memory, fill it with data, and then copy it to device, like this:

NLayer* nL;
NLayer h_nL;
int i;
int tmp=9;
// Allocate data on device
cudaMalloc((void**)&nL, sizeof(NLayer));
cudaMalloc((void**)&h_nL.neurons, 6*sizeof(Neuron));
// Copy nlayer with pointers to device
cudaMemcpy(nL, &h_nL, sizeof(NLayer), cudaMemcpyHostToDevice);

Also, don't forget to always check for any errors from CUDA routines.

UPDATE

In second version of your code:

cudaMemcpy(&d_layer->neurons[i].weights,&d_weights,...) --- again, you are dereferencing device pointer (d_layer) on host. Instead, you should use

cudaMemcpy(&h_layer.neurons[i].weights,&d_weights,sizeof(float*),cudaMemcpyHostToDevice

Here you take h_layer (host structure), read its element (h_layer.neurons), which is pointer to device memory. Then you do some pointer arithmetics on it (&h_layer.neurons[i].weights). No access to device memory is needed to compute this address.

这篇关于CUDA:在struct内部结构数组的分配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆