将主机内存复制到cuda __device__变量 [英] copying host memory to cuda __device__ variable

查看:57
本文介绍了将主机内存复制到cuda __device__变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用Google找到解决我问题的方法,但失败了.尽管我认为这是一个非常标准的情况,但是有很多片段并不完全适合我的情况.

i've tried to find a solution to my problem using google but failed. there were a lot of snippets that didn't fit my case exactly, although i would think that it's a pretty standard situation.

我必须将几个不同的数据数组传输到cuda.它们都是具有动态大小的简单结构数组.因为我不想将所有内容都放入cuda内核调用中,所以我认为 __ device __ 变量应该正是我所需要的.

I'll have to transfer several different data arrays to cuda. all of them being simple struct arrays with dynamic size. since i don't want to put everything into the cuda kernel call, i thought, that __device__ variables should be exactly what i need.

这是我尝试将主机数据复制到 __ device __ 变量的方式:

this is how i tried to copy my host data to the __device__ variable:

// MaterialDescription.h
struct MaterialDescription {
    unsigned char type;
    unsigned char diffuseR, diffuseG, diffuseB;
    __device__ __forceinline__ float4 diffuseColour() const {  return make_float4((float) diffuseR / 255.f, (float) diffuseG / 255.f, (float) diffuseB / 255.f, 0); }
};

// kernel.h
__device__ MaterialDescription* g_materials;
__global__ void deferredRenderKernel() {
     something = g_materials[indexDependingOnData].diffuseColour();
}

//Cuda.cu
const std::vector<MaterialDescription>& materials = getData();

// version 1
cudaMemcpyToSymbol(g_materials, &materials.front(), sizeof(MaterialDescription) * materialCount);

// version 2
MaterialDescription* ptr;
cudaMalloc((void**)&ptr, sizeof(MaterialDescription) * materialCount);
cudaMemcpy(ptr, &materials.front(), sizeof(MaterialDescription) * materialCount, cudaMemcpyHostToDevice);
cudaMemcpyToSymbol(g_materials, ptr, sizeof(MaterialDescription) * materialCount);

// version 3
cudaMalloc((void**)&g_materials, sizeof(MaterialDescription) * materialCount);
cudaMemcpyToSymbol(g_materials, &materials.front(), sizeof(MaterialDescription) * materialCount);

deferredRenderKernel<<<numBlocks, threadsPerBlock>>();

但是,唯一有效的版本包括一个内核参数

however, the only version that worked included a kernel parameter

// kernel.h
__device__ MaterialDescription* g_materials;
__global__
void deferredRenderKernel(MaterialDescription* ptr) {
    g_materials = ptr;
    something = g_materials[indexDependingOnData].diffuseColour();
}

//Cuda.cu
// version 4, the only one working. but i pass again via kernel param
// in the worst case i'll stick to this, at least i wouldn't have to pass the
// parameters into device functions
MaterialDescription* ptr;
cudaMalloc((void**)&ptr, sizeof(MaterialDescription) * materialCount);
cudaMemcpy(ptr, &materials.front(), sizeof(MaterialDescription) * materialCount, cudaMemcpyHostToDevice);

deferredRenderKernel<<<numBlocks, threadsPerBlock>>(ptr);

这个版本(由Robert Crovella提出)也可以使用,但是内存不是动态分配的.

edit: this version (as proposed by Robert Crovella) also works, but the memory is not allocated dynamically.

 // kernel.h
 __device__ MaterialDescription g_materials[VIENNA_MAX_MATERIAL_COUNT];
__global__
void deferredRenderKernel() {
    something = g_materials[indexDependingOnData].diffuseColour();
}

// cuda.h
// version 1
cudaMemcpyToSymbol(g_materials, &materials.front(), sizeof(MaterialDescription) * materialCount);

其他变量和结构与上面相同.

other variables and structures are the same as above.

它终于按照我想要的方式工作了.

It finally works just the way i want.

MaterialDescription.h

struct MaterialDescription {
    unsigned char type;
    unsigned char diffuseR, diffuseG, diffuseB;
    __device__ __forceinline__ float4 diffuseColour() const {  return make_float4((float) diffuseR / 255.f, (float) diffuseG / 255.f, (float) diffuseB / 255.f, 0); }
};

kernel.h

__device__ MaterialDescription* g_materials;
__global__ void deferredRenderKernel() {
    something = g_materials[indexDependingOnData].diffuseColour();
}

Cuda.cu

const std::vector<MaterialDescription>& materials = getData();
MaterialDescription* dynamicArea;

// allocate memory on the device for our data
cudaMalloc((void**)&dynamicArea, sizeof(MaterialDescription) * materialCount); 

// copy our data into the allocated memory
cudaMemcpy(dynamicArea, &materials.front(), sizeof(MaterialDescription) * materialCount, cudaMemcpyHostToDevice);

// copy the pointer to our data into the global __device__ variable.
cudaMemcpyToSymbol(g_materials, &dynamicArea, sizeof(MaterialDescription*));

推荐答案

如果您在提出这样的问题时举了一个完整的例子,那就太好了.查看您对 MaterialDescription materials 的定义将很有用.看看 SO会期望什么,以查找为什么我的代码无法正常工作?"类型的问题

It would be nice if you gave a complete example when asking questions like this. It would be useful to see your definition of MaterialDescription and materials. Take a look at what SO expects for questions of the type "why isn't my code working?"

这仅保存指针的存储空间:

__device__ MaterialDescription* g_materials;

您不能将整个结构/对象复制到指针上.

You can't copy a whole structure/object onto a pointer.

当您像这样分配设备变量时,它是静态分配,这意味着需要在编译时知道大小.因此,如果您在编译时知道大小(或最大大小),则可以执行以下操作:

When you allocate a device variable like this, it is a static allocation, which means the size needs to be known at compile time. So if you know the size (or max size) at compile time, you could do something like this:

__device__ MaterialDescription g_materials[MAX_SIZE];

// this assumes materialCount <= MAX_SIZE
cudaMemcpyToSymbol(g_materials, &(materials.front()), sizeof(MaterialDescription) * materialCount);

这篇关于将主机内存复制到cuda __device__变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆