OpenCL中的重叠传输和设备计算 [英] Overlapping transfers and device computation in OpenCL

查看:86
本文介绍了OpenCL中的重叠传输和设备计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是OpenCL的初学者,很难理解. 我想改善主机和设备之间的图像传输. 我制定了一个计划以更好地了解我.

I am a beginner with OpenCL and I have difficulties to understand something. I want to improve the transfers of an image between host and device. I made a scheme to better understand me.

顶部:我现在拥有的|下:我想要什么 HtD(主机到设备)和DtH(设备到主机)是内存传输. K1和K2是内核.

Top: what I have now | Bottom: what I want HtD (Host to Device) and DtH ( Device to Host) are memory transfers. K1 and K2 are kernels.

我考虑过使用映射内存,但是第一次传输(从主机到设备)是通过clSetKernelArg()命令完成的,不是吗? 还是我必须将输入图像切成子图像并使用映射来获取输出图像?

I thought about using mapping memory, but the first transfer (Host to Device) is done with the clSetKernelArg() command, no ? Or do I have to cut my input image into sub-image and use mapping to get the output image ?

谢谢.

更多信息

K1处理mem输入图像. K2处理来自K1的输出图像.

K1 process mem input image. K2 process output image from K1.

因此,我想将MemInput转换为K1的几段. 而且我想在主机上阅读并保存由K2处理的MemOuput.

So, I want to transfer MemInput into several pieces for K1. And I want to read and save on the host the MemOuput processed by K2.

推荐答案

您可能已经看到,您可以使用clEnqueueWriteBuffer和类似方法从主机到设备进行传输.

As you may have already seen, you do a transfer from host to device by using clEnqueueWriteBuffer and similar.

其中所有带有关键字"enqueue"的命令都具有特殊的属性:这些命令不会直接执行,但是当您使用clFinishclFlushclEnqueueWaitForEvents以及使用clEnqueueWriteBuffer进行阻止时,它们会触发模式等等.

All the commands having the keyword 'enqueue' in them have a special property: The commands are not executed directly, but when you tigger them using clFinish, clFlush, clEnqueueWaitForEvents, using clEnqueueWriteBuffer in blocking mode and some more.

这意味着所有操作都立即发生,并且您必须使用事件对象对其进行同步.由于一切都可能同时发生,因此您可以执行以下操作(每个点同时发生):

This means that all action happens at once and you have to synchronise it using the event objects. As everything (may) happen at once, you could do something like this (Each point happens at the same time):

  1. 传输数据A
  2. 过程数据A&传输数据B
  3. 过程数据B&传输数据C&检索数据A'
  4. 过程数据C&检索数据B'
  5. 检索数据C'

请记住:没有事件对象的入队任务可能导致所有入队元素同时执行!

Remember: Enqueueing Tasks without Event-Objects may result in a simultaneous execution of all enqueued elements!

为确保在传输B之前不会发生流程数据B,您必须从clEnqueueWriteBuffer检索事件对象,并将其作为对象提供给f.i. clEnqueueNDRangeKernel

To make sure that Process Data B doesn't happen before Transfer B, you have to retrieve an event object from clEnqueueWriteBuffer and supply it as an object to wait for to f.i. clEnqueueNDRangeKernel

cl_event evt;
clEnqueueWriteBuffer(... , bufferB , ... , ... , ... , bufferBdata , NULL , NULL , &evt);
clEnqueueNDRangeKernel(... , kernelB , ... , ... , ... , ... , 1 , &evt, NULL);

每个命令当然可以提供某些对象,而不是提供NULL并生成一个新的事件对象. last旁边的参数是一个数组,因此您可以事件等待几个事件!

Instead of supplying NULL, each command can of course wait on certain objects AND generate a new event object. The parameter next to last is an array, so you can event wait for several events!


总结下面的评论 传输数据-什么命令在哪里起作用?


To summarise the comments below Transferring data - What command acts where?


       CPU                        GPU
                            BufA       BufB
array[] = {...}
clCreateBuffer()  ----->  [     ]              //Create (empty) Buffer in GPU memory *
clCreateBuffer()  ----->  [     ]    [     ]   //Create (empty) Buffer in GPU memory *
clWriteBuffer()   -arr->  [array]    [     ]   //Copy from CPU to GPU
clCopyBuffer()            [array] -> [array]   //Copy from GPU to GPU
clReadBuffer()    <-arr-  [array]    [array]   //Copy from GPU to CPU

*您可以通过使用host_ptr参数提供数据直接初始化缓冲区.

* You may initialise the buffer directly by providing data using the host_ptr parameter.

这篇关于OpenCL中的重叠传输和设备计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆