通过结构来使用GPU包含float数组的OpenCL [英] Passing struct to GPU with OpenCL that contains an array of floats

查看:457
本文介绍了通过结构来使用GPU包含float数组的OpenCL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前,我有一些数据,我想传递给我的GPU和2相乘。

I currently have some data that I would like to pass to my GPU and the multiply it by 2.

我创建了可以在这里看到一个结构:

I have created a struct which can be seen here:

struct GPUPatternData
{
    cl_int nInput,nOutput,patternCount, offest;
    cl_float* patterns;
};

这个结构应该包含float数组。浮标的阵列I将不知道,因为它是由用户指定,直到运行时间。

This struct should contain an array of floats. The array of floats I will not know untill run time as it is specified by the user.

主机code:

typedef struct GPUPatternDataContatiner
{

    int nodeInput,nodeOutput,patternCount, offest;
    float* patterns;
} GPUPatternData; 
__kernel void patternDataAddition(__global GPUPatternData* gpd,__global GPUPatternData* output)
{
    int index = get_global_id(0);
    if(index < gpd->patternCount)
    {
        output.patterns[index] = gpd.patterns[index]*2;
    }
}

下面是Host code:

Here is the Host code:

GPUPattern::GPUPatternData gpd;    
gpd.nodeInput = ptSet->getInputCount();
gpd.nodeOutput = ptSet->getOutputCount();
gpd.offest = gpd.nodeInput+gpd.nodeOutput;
gpd.patternCount = ptSet->getCount();
gpd.patterns = new cl_float [gpd.patternCount*gpd.offest];

GPUPattern::GPUPatternData gridC;
gridC.nodeInput = ptSet->getInputCount();
gridC.nodeOutput = ptSet->getOutputCount();
gridC.offest = gpd.nodeInput+gpd.nodeOutput;
gridC.patternCount = ptSet->getCount();
gridC.patterns = new cl_float [gpd.patternCount*gpd.offest];

所有数据被初始化,然后用值初始化,然后传递到GPU

All the data is initialized then initialized with values and then passed to the GPU

int elements = gpd.patternCount;
size_t ofsdf = sizeof(gridC);
size_t dataSize = sizeof(GPUPattern::GPUPatternData)+ (sizeof(cl_float)*elements);

cl_mem bufferA = clCreateBuffer(gpu.context,CL_MEM_READ_ONLY,dataSize,NULL,&err);
openCLErrorCheck(&err);
//Copy the buffer to the device
err = clEnqueueWriteBuffer(queue,bufferA,CL_TRUE,0,dataSize,(void*)&gpd,0,NULL,NULL);

//This buffer is being written to only
cl_mem bufferC = clCreateBuffer(gpu.context,CL_MEM_WRITE_ONLY,dataSize,NULL,&err);
openCLErrorCheck(&err);
err = clEnqueueWriteBuffer(queue,bufferC,CL_TRUE,0,dataSize,(void*)&gridC,0,NULL,NULL);

一切都内置其中我检查只是看它停留在0误差

Everything is built which I check just watching the error which stays at 0

cl_program program = clCreateProgramWithSource(gpu.context,1, (const char**) &kernelSource,NULL,&err);

////Build program
err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);

char build[2048];
clGetProgramBuildInfo(program, gpu.device, CL_PROGRAM_BUILD_LOG, 2048, build, NULL);

////Create kernal
cl_kernel kernal = clCreateKernel(program, "patternDataAddition",&err);

////Set kernal arguments
err  = clSetKernelArg(kernal,  0, sizeof(cl_mem), &bufferA);
err |= clSetKernelArg(kernal,  1, sizeof(cl_mem), &bufferC);

据然后拉开序幕

size_t globalWorkSize = 1024;
size_t localWorkSize = 512;

err = clEnqueueNDRangeKernel(queue, kernal, 1, NULL, &globalWorkSize, &localWorkSize, 0, NULL, NULL); 

clFinish(queue);

它在这一点上,一切都错了。

Its at this point it all goes wrong

err = clEnqueueReadBuffer(queue, bufferC, CL_TRUE, 0, dataSize, &gridC, 0, NULL, NULL);
clFinish(queue);

在这种情况下的错误是-5(CL_OUT_OF_RESOURCES)

The error in this case is -5 (CL_OUT_OF_RESOURCES).

另外,如果我改变该行:

Also if I change the line:

err = clEnqueueReadBuffer(queue, bufferC, CL_TRUE, 0, dataSize, &gridC, 0, NULL, 

err = clEnqueueReadBuffer(queue, bufferC, CL_TRUE, 0, dataSize*1000, &gridC, 0, NULL, NULL);

我得到错误-30(CL_INVALID_VALUE)。

I get the error -30 (CL_INVALID_VALUE).

所以我的问题是为什么会收到回读缓冲区时,我的错误。另外,我不知道如果我无法使用指针来我float数组作为这哪是给我错了的sizeof()用于 数据大小这给了我错误的缓冲区大小。

So my question is why am i getting the errors I am when reading back the buffer. Also I am not sure if I am unable to use a pointer to my float array as could this be giving me the wrong sizeof() used for datasize which gives me the wrong buffer size.

推荐答案

您无法通过包含指针到OpenCL的一个结构

You cannot pass a struct that contains pointers into OpenCL

http://www.khronos.org/registry/cl/规格/ OpenCL的-1.2.pdf (6.9节)

您可以正确的,因为埃里克·班维尔指出,如果你不是很内存限制,你可以这样做

You can either correct as Eric Bainville pointed out or if you are not very restrict on memory you can do something like

struct GPUPatternData
{
    cl_int nInput,nOutput,patternCount, offest;
    cl_float patterns[MAX_SIZE];
};


编辑:确定,如果记忆是一个问题。既然你只使用模式 patternCount 您可以将模式从结构复制并分别传递给内核。


OK if memory is an issue. Since you only use the patterns and patternCount you can copy the patterns from the struct and pass them to the kernel separately.

struct GPUPatternData
    {
        cl_int nInput,nOutput,patternCount, offest;
        cl_float patterns*;
    };

复制模式来GPU从 GPD 模式分配空间 gridC 在GPU上。
然后

copy patterns to GPU from gpd and allocate space for patterns in gridC on GPU. then

您可以分别通过缓冲器

__kernel void patternDataAddition(int gpd_patternCount,
    __global const float * gpd_Patterns,
    __global float * gridC_Patterns) {

    int index = get_global_id(0);
    if(index < gpd_patternCount)
    {
        gridC_Patterns[index] = gpd_Patterns[index]*2;
    }
}

当你从内核回来的数据复制回 gridC.patterns 直接

when you come back from the kernel copy the data back to gridC.patterns directly

还有一个:

您不必改变你的CPU结构。它保持不变。然而,这部分

You don't have to change your CPU struct. It stays the same. However this part

size_t dataSize = sizeof(GPUPattern::GPUPatternData)+ (sizeof(cl_float)*elements);

cl_mem bufferA = clCreateBuffer(gpu.context,CL_MEM_READ_ONLY,dataSize,NULL,&err);
openCLErrorCheck(&err);
//Copy the buffer to the device
err = clEnqueueWriteBuffer(queue,bufferA,CL_TRUE,0,dataSize,(void*)&gpd,0,NULL,NULL);

应改为类似

size_t dataSize = (sizeof(cl_float)*elements);  // HERE
float* gpd_dataPointer = gpd.patterns;    // HERE

cl_mem bufferA = clCreateBuffer(gpu.context,CL_MEM_READ_ONLY,dataSize,NULL,&err);
openCLErrorCheck(&err);

// Now use the gpd_dataPointer
err = clEnqueueWriteBuffer(queue,bufferA,CL_TRUE,0,dataSize,(void*)&(gpd_dataPointer),0,NULL,NULL);

同样的事情会在 gridC

当你拷贝回来,将它复制到 gridC_dataPointer AKA gridC.dataPointer

And when you copy back, copy it to gridC_dataPointer AKA gridC.dataPointer

,然后继续使用的结构,仿佛什么都没有发生。

这篇关于通过结构来使用GPU包含float数组的OpenCL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆