从OpenCL中的GPU内核创建GPU上由主机ptr指向的缓冲区的副本 [英] Creating a copy of the buffer pointed by host ptr on the GPU from GPU kernel in OpenCL

查看:93
本文介绍了从OpenCL中的GPU内核创建GPU上由主机ptr指向的缓冲区的副本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图了解CL_MEM_USE_HOST_PTR和CL_MEM_COPY_HOST_PTR的工作方式. 基本上,当使用CL_MEM_USE_HOST_PTR时,例如在创建2D图像时,这不会将任何内容复制到设备,而是GPU将引用主机上的映射内存(clEnqueueMapBuffer对其进行映射),进行处理,然后我们可以将结果写入其他位置

I was trying to understand how exactly CL_MEM_USE_HOST_PTR and CL_MEM_COPY_HOST_PTR work. Basically when using CL_MEM_USE_HOST_PTR, say in creating a 2D image, this will copy nothing to the device, instead the GPU will refer the mapped memory(clEnqueueMapBuffer maps it) on the host, do the processing and we can write the results to some other location.

另一方面,如果我使用CL_MEM_COPY_HOST_PTR,它将创建设备上主机ptr指向的数据的副本(我想它将创建一个单独的副本,而不仅仅是缓存).现在,将对复制到设备的数据进行处理,然后将结果再次复制到主机.我希望我已经正确理解了.

On the other hand if I use the CL_MEM_COPY_HOST_PTR, it will create a copy of the data pointed to by host ptr on the device(I guess it will create a separate copy not just caching). Now the processing will be done on the data that was copied to the device and then again the results are copied to host. I hope I have understood it correctly.

所以我的查询是... 出于好奇,我想这样做.我将使用CL_MEM_USE_HOST_PTR,现在即使设备可以访问主机内存,我也希望GPU内核在设备本身上创建一个单独的副本(不使用COPY_HOST_PTR,因为这是在主机本身中再次完成),然后执行处理此数据.怎么办?

So my query is... Its just out of my curiosity that I want to do it this way. I will use the CL_MEM_USE_HOST_PTR and now even though the device can access the host memory, I want the GPU kernel to create a separate copy onto the device itself(not using the COPY_HOST_PTR because this is again done in the host itself) and then do the processing on this data. How can this be done??

推荐答案

使用CL_MEM_READ_WRITE创建要复制到的缓冲区,但不要在主机上对其进行初始化. 最近,我不得不为连续的整数初始化一个新的缓冲区

Create your buffer to copy to using CL_MEM_READ_WRITE, but don't initialize it on your host. I recently had to init a fresh buffer to consecutive integers

cl_mem _offsetBuffer;
_offsetBuffer = clCreateBuffer(_context, CL_MEM_READ_WRITE, (size_t)(count * sizeof(cl_int)), NULL, &errorCode);

上面的

clCreateBuffer除了给您内存对象的句柄之外,对主机的内存没有任何作用.然后,我使用内核分配顺序值,因为事实证明,图形卡上的内存速度比在cpu上分配值要快得多.

clCreateBuffer above doesn't do anything to your host's memory other than give you a handle to the memory object. I then use a kernel to assign the sequential values, because the memory speed on the graphics card proved to be much faster than assigning the values on the cpu.

__kernel void initOffsetBuffer(__global int* offsetBuffer, const int offsetBufferLength, const int startValue){
    int gid = get_global_id(0);
    int gs = get_global_size(0);
    int i;
    for(i=gid;i<offsetBufferLength;i+=gs){
        offsetBuffer[i] = i+startValue;
    }
}

此时主机内存中仍然没有缓冲区的副本.我将需要使用clEnqueueReadBuffer将其复制到主机.

There is still no copy of the buffer in host memory at this point. I would need to use clEnqueueReadBuffer to copy it to the host.

您可以轻松地将此代码修改为复制内核,而不仅仅是直接分配.

You can easily modify this code to be a copying kernel rather than just straight assignment.

__kernel void copyBuffer(__global const int* srcBuffer, __global int* dstBuffer, const int bufferLength){
    int gid = get_global_id(0);
    int gs = get_global_size(0);
    int i;
    for(i=gid;i<bufferLength;i+=gs){
        dstBuffer[i] = srcBuffer[i];
    }
}

这篇关于从OpenCL中的GPU内核创建GPU上由主机ptr指向的缓冲区的副本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆