OpenCL 缓冲区创建 [英] OpenCL Buffer Creation

查看:82
本文介绍了OpenCL 缓冲区创建的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 OpenCL 的新手,虽然到目前为止我已经了解了所有内容,但我无法理解缓冲区对象的工作原理.

I am fairly new to OpenCL and though I have understood everything up until now, but I am having trouble understanding how buffer objects work.

我不明白缓冲区对象的存储位置.在 this StackOverflow 问题中指出:

I haven't understood where a buffer object is stored. In this StackOverflow question it is stated that:

如果您只有一台设备,很可能 (99.99%) 会在设备中.(在极少数情况下,如果设备暂时没有足够的内存,它可能在主机中)

If you have one device only, probably (99.99%) is going to be in the device. (In rare cases it may be in the host if the device does not have enough memory for the time being)

对我来说,这意味着缓冲区对象存储在设备内存中.但是,正如 this StackOverflow 问题,如果在 clCreateBuffer 中使用标志 CL_MEM_ALLOC_HOST_PTR,则使用的内存很可能是固定内存.我的理解是,当内存被固定时,它不会被换出.这意味着固定内存必须位于 RAM 中,而不是设备内存中.

To me, this means that buffer objects are stored in device memory. However, as is stated in this StackOverflow question, if the flag CL_MEM_ALLOC_HOST_PTR is used in clCreateBuffer, the memory used will most likely be pinned memory. My understanding is that, when memory is pinned it will not be swapped out. This means that pinned memory MUST be located in RAM, not in device memory.

那么实际发生了什么?

我想知道的标志是什么:

What I would like to know what do the flags:

  • CL_MEM_USE_HOST_PTR
  • CL_MEM_COPY_HOST_PTR
  • CL_MEM_ALLOC_HOST_PTR

暗示缓冲区的位置.

谢谢

推荐答案

规范(故意?)在这个主题上含糊不清,给实现者留下了很大的自由.因此,除非您针对的 OpenCL 实现对标志做出明确保证,否则您应该将它们视为建议性的.

The specification is (deliberately?) vague on the topic, leaving a lot of freedom to implementors. So unless an OpenCL implementation you are targeting makes explicit guarantees for the flags, you should treat them as advisory.

首先,CL_MEM_COPY_HOST_PTR 实际上与分配无关,它只是意味着您希望 clCreateBuffer 用内存中的内容预填充分配的内存在您传递给调用的 host_ptr 处.这就好像你用 host_ptr = NULL 调用了 clCreateBuffer 而没有这个标志,然后做了一个阻塞 clEnqueueWriteBuffer 调用以写入整个缓冲区.

First off, CL_MEM_COPY_HOST_PTR actually has nothing to do with allocation, it just means that you would like clCreateBuffer to pre-fill the allocated memory with the contents of the memory at the host_ptr you passed to the call. This is as if you called clCreateBuffer with host_ptr = NULL and without this flag, and then made a blocking clEnqueueWriteBuffer call to write the entire buffer.

关于分配方式:

  • CL_MEM_USE_HOST_PTR - 这意味着您已经预先分配了一些内存,正确对齐,并希望将其用作缓冲区的后备内存.如果设备不支持直接访问主机内存,或者驱动程序决定将卷影复制到 VRAM 比直接访问系统更有效,则实现仍然可以分配设备内存并在缓冲区和分配的内存之间来回复制记忆.不过,在可以直接从系统内存读取的实现中,这是零拷贝缓冲区的一种选择.
  • CL_MEM_ALLOC_HOST_PTR - 这是一个提示,告诉 OpenCL 实现您计划通过将缓冲区映射到主机地址空间来从主机端访问缓冲区,但与 CL_MEM_USE_HOST_PTRcode>,您将分配本身留给 OpenCL 实现.对于支持它的实现,这是零复制缓冲区的另一种选择:创建缓冲区,将其映射到主机,获取主机算法或 I/O 以写入映射的内存,然后取消映射并在 GPU 内核中使用它.与 CL_MEM_USE_HOST_PTR 不同,这为使用可直接映射到 CPU 地址空间(例如 PCIe BAR)的 VRAM 敞开了大门.
  • 默认(以上 2 项都不是):分配给设备最方便的位置.通常是 VRAM,如果设备不支持将内存映射到主机内存,这通常意味着如果将它映射到主机地址空间,最终会得到 2 个缓冲区副本,一个在 VRAM 中,一个在系统内存中,而 OpenCL 实现在 2 之间来回复制.
  • CL_MEM_USE_HOST_PTR - this means you've pre-allocated some memory, correctly aligned, and would like to use this as backing memory for the buffer. The implementation can still allocate device memory and copy back and forth between your buffer and the allocated memory, if the device does not support directly accessing host memory, or if the driver decides that a shadow copy to VRAM will be more efficient than directly accessing system memory. On implementations that can read directly from system memory though, this is one option for zero-copy buffers.
  • CL_MEM_ALLOC_HOST_PTR - This is a hint to tell the OpenCL implementation that you're planning to access the buffer from the host side by mapping it into host address space, but unlike CL_MEM_USE_HOST_PTR, you are leaving the allocation itself to the OpenCL implementation. For implementations that support it, this is another option for zero copy buffers: create the buffer, map it to the host, get a host algorithm or I/O to write to the mapped memory, then unmap it and use it in a GPU kernel. Unlike CL_MEM_USE_HOST_PTR, this leaves the door open for using VRAM that can be mapped directly to the CPU's address space (e.g. PCIe BARs).
  • Default (neither of the above 2): Allocate wherever most convenient for the device. Typically VRAM, and if memory-mapping into host memory is not supported by the device, this typically means that if you map it into host address space, you end up with 2 copies of the buffer, one in VRAM and one in system memory, while the OpenCL implementation internally copies back and forth between the 2.

请注意,实现也可以使用提供的任何访问标志(CL_MEM_HOST_WRITE_ONLYCL_MEM_HOST_READ_ONLYCL_MEM_HOST_NO_ACCESSCL_MEM_WRITE_ONLYCL_MEM_READ_ONLYCL_MEM_READ_WRITE) 来影响分配内存的决定.

Note that the implementation may also use any access flags provided ( CL_MEM_HOST_WRITE_ONLY, CL_MEM_HOST_READ_ONLY, CL_MEM_HOST_NO_ACCESS, CL_MEM_WRITE_ONLY, CL_MEM_READ_ONLY, and CL_MEM_READ_WRITE) to influence the decision where to allocate memory.

最后,关于固定"内存:许多现代系统都有一个 IOMMU,当它处于活动状态时,来自设备的系统内存访问会导致 IOMMU 页面错误,因此从技术上讲,主机内存甚至不需要常驻.在任何情况下,OpenCL 实现通常都与内核级设备驱动程序深度集成,该驱动程序通常可以按需固定系统内存范围(将它们从分页中排除).因此,如果使用 CL_MEM_USE_HOST_PTR,您只需要确保提供适当对齐的内存,并且实现将为您处理固定.

Finally, regarding "pinned" memory: many modern systems have an IOMMU, and when this is active, system memory access from devices can cause IOMMU page faults, so the host memory technically doesn't even need to be resident. In any case, the OpenCL implementation is typically deeply integrated with a kernel-level device driver, which can typically pin system memory ranges (exclude them from paging) on demand. So if using CL_MEM_USE_HOST_PTR you just need to make sure you provide appropriately aligned memory, and the implementation will take care of pinning for you.

这篇关于OpenCL 缓冲区创建的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆