OpenCL主机复制性能警告 [英] OpenCL Host Copying Performance Warning

查看:85
本文介绍了OpenCL主机复制性能警告的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个OpenCL程序,可在共享上下文中调整VBO对象的顶点坐标. OpenCL设备是GPU设备.

但是,我收到以下警告:

缓冲区性能警告:缓冲区对象1(绑定到GL_VERTEX_ATTRIB_ARRAY_BUFFER_BINDING_ARB(0),GL_VERTEX_ATTRIB_ARRAY_BUFFER_BINDING_ARB(4)和GL_ARRAY_BUFFER_ARB,使用提示为GL_DYNAMIC_DRAW_V,已从VIDEO复制/存储到存储器中.

据我所知(必须添加一些glFlush()调用以提供帮助),这是在对glDrawElements(...)的调用期间发生的.着色器位置0处的顶点属性数组是顶点,着色器位置4处的顶点属性数组是纹理坐标.

问题是,为什么会这样?

循环具有以下形式:

glFinish()
clEnqueueAcquireGLObjects(...)
clEnqueueNDRangeKernel(...)
clFinish(...)
clEnqueueReleaseGLObjects(...)

解决方案

出于同样的原因,在OpenGL支持VBO之前,对glDrawElements (...)的每次调用都会立即从客户端 (而不是GPU开始执行命令的时间). GL在后台将命令排队,但是对于某些操作,它必须阻止和/或制作数据副本,以防止并发处理器(在这种情况下为CPU或OpenCL)在OpenGL实际完成命令之前修改数据.它需要实际调用glDrawElements (...)时存在的数据的副本.

VBO通过为OpenGL服务器提供对所有对顶点内存的访问的显式控制来解决了这一基本问题.不再可能以OpenGL服务器不知道的方式同时修改顶点内存.任何尝试修改顶点内存的尝试都需要 GL 管道中的命令,因此确保队列中的命令具有正确的顶点数据副本可以完全由GL本身进行管理,而不必诉诸不必要的复制或阻塞. >

当您在GL和CL之间共享缓冲区对象时,实际上会破坏GL中缓冲区对象的某些美感.它提供了对GL拥有的内存的单独的并发管道访问,因此GL不能再确保没有任何东西会在没有知识的情况下修改它拥有的 .

clEnqueueAcquireGLObjects

获取从OpenGL对象创建的OpenCL内存对象.

[...]

注释

在调用clEnqueueAcquireGLObjects之前,应用程序必须确保已完成所有访问mem_objects中指定的对象的未决GL操作.这可以通过在所有GL上下文中发出并等待glFinish命令的完成(带有对这些对象的未决引用)来方便地实现.实现可以提供更有效的同步方法.例如,在某些平台上,调用glFlush可能已足够,或者线程中可能隐含了同步,或者可能存在特定于供应商的扩展程序,这些扩展程序使得可以在GL命令流中放置篱笆并在CL命令队列中等待该篱笆的完成. .请注意,此时,glFinish之外的其他同步方法都无法在OpenGL实现之间移植.

类似地,在调用clEnqueueReleaseGLObjects之后,应用程序负责确保在执行后续的引用这些对象的GL命令之前,已完成所有访问mem_objects中指定的对象的未决OpenCL操作.这可以通过使用clEnqueueReleaseGLObjects返回的事件对象调用clWaitForEvents或调用glFinish来实现.如上所述,某些实现可能会提供更有效的方法.

您需要GL/CL同步才能正确处理此问题,这说明了glFlush (...)为何起作用.但是,刷新命令队列通常是不够的.这仅告诉GL立即开始处理已缓冲的所有命令,但甚至没有尝试确保在控制权返回给CPU之前它们会 完成 .

要完全回答这个问题,需要更多细节.尤其是使用缓冲区数据的CL和GL命令的顺序以及正在使用的同步方式.

I have an OpenCL program that adjusts the vertex coordinates of a VBO object in a shared context. The OpenCL device is a GPU device.

However, I get the following warning:

Buffer performance warning: Buffer object 1 (bound to GL_VERTEX_ATTRIB_ARRAY_BUFFER_BINDING_ARB (0), GL_VERTEX_ATTRIB_ARRAY_BUFFER_BINDING_ARB (4), and GL_ARRAY_BUFFER_ARB, usage hint is GL_DYNAMIC_DRAW) is being copied/moved from VIDEO memory to HOST memory.

As near as I can tell (had to add some glFlush() calls to help), this occurs during the call to glDrawElements(...). Vertex attribute array at shader location 0 is vertex and vertex attribute array at shader location 4 is texture coordinate.

The question is, why does this occur?

EDIT: the loop is of the following form:

glFinish()
clEnqueueAcquireGLObjects(...)
clEnqueueNDRangeKernel(...)
clFinish(...)
clEnqueueReleaseGLObjects(...)

解决方案

For the same reason that before OpenGL supported VBOs every call to glDrawElements (...) would pull memory from the client immediately (rather than when the GPU started executing the command). GL queues up commands in the background, but for some operations it has to block and/or make a copy of data to prevent a concurrent processor (CPU or OpenCL in this case) from modifying the data before OpenGL actually finishes the command. It needs a copy of the data as it existed when glDrawElements (...) was actually called.

VBOs solved this fundamental problem by giving the OpenGL server explicit control over all access to vertex memory. It was no longer possible to modify vertex memory concurrently in a way that the OpenGL server did not know about. Any attempt to modify vertex memory required a command in the GL pipeline, so ensuring that queued commands have the correct copy of vertex data could be managed entirely by GL itself without resorting to unnecessary copying or blocking.

When you share buffer objects between GL and CL you actually break some of the beauty of buffer objects in GL. It gives a separate concurrent pipeline access to GL owned memory, and thus GL can no longer be sure that nothing has modified the data it owns without its knowledge.

clEnqueueAcquireGLObjects

Acquire OpenCL memory objects that have been created from OpenGL objects.

[...]

Notes

Prior to calling clEnqueueAcquireGLObjects, the application must ensure that any pending GL operations which access the objects specified in mem_objects have completed. This may be accomplished portably by issuing and waiting for completion of a glFinish command on all GL contexts with pending references to these objects. Implementations may offer more efficient synchronization methods; for example on some platforms calling glFlush may be sufficient, or synchronization may be implicit within a thread, or there may be vendor-specific extensions that enable placing a fence in the GL command stream and waiting for completion of that fence in the CL command queue. Note that no synchronization methods other than glFinish are portable between OpenGL implementations at this time.

Similarly, after calling clEnqueueReleaseGLObjects, the application is responsible for ensuring that any pending OpenCL operations which access the objects specified in mem_objects have completed prior to executing subsequent GL commands which reference these objects. This may be accomplished portably by calling clWaitForEvents with the event object returned by clEnqueueReleaseGLObjects, or by calling glFinish. As above, some implementations may offer more efficient methods.

You need GL / CL synchronization to properly deal with this, which explains why glFlush (...) helps. However, flushing the command queue is usually inadequate. This only tells GL to start working on all of the commands it has buffered immediately, but does not even attempt to ensure that they will finish before control returns to the CPU.

To completely answer this question, more details are necessary; particularly the sequence of CL and GL commands that are using the buffer data and what synchronization you are using.

这篇关于OpenCL主机复制性能警告的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆