Cuda 和 OpenGL 互操作 [英] Cuda and OpenGL Interop

查看:27
本文介绍了Cuda 和 OpenGL 互操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在阅读 CUDA 文档,在我看来,每个需要与 OpenGL 接口的缓冲区都需要在 glBuffer 中创建.

I've been reading through the CUDA documentation and it seems to me, that every buffer that needs to interface with OpenGL needs to be created in the glBuffer.

根据 nvidia 编程指南,必须这样做:

According to the nvidia programming guide, this has to be done like this:

GLuint positionsVBO;
struct cudaGraphicsResource* positionsVBO_CUDA;

int main() {

    // Explicitly set device
    cudaGLSetGLDevice(0);
    // Initialize OpenGL and GLUT
    ...
    glutDisplayFunc(display);
    // Create buffer object and register it with CUDA
    glGenBuffers(1, positionsVBO);
    glBindBuffer(GL_ARRAY_BUFFER, &vbo);
    unsigned int size = width * height * 4 * sizeof(float);
    glBufferData(GL_ARRAY_BUFFER, size, 0, GL_DYNAMIC_DRAW);
    glBindBuffer(GL_ARRAY_BUFFER, 0);
    cudaGraphicsGLRegisterBuffer(&positionsVBO_CUDA, positionsVBO, cudaGraphicsMapFlagsWriteDiscard);

    // Launch rendering loop
    glutMainLoop();
}
void display() {
    // Map buffer object for writing from CUDA
    float4* positions;
    cudaGraphicsMapResources(1, &positionsVBO_CUDA, 0);
    size_t num_bytes;
    cudaGraphicsResourceGetMappedPointer((void**)&positions, &num_bytes, positionsVBO_CUDA));
    // Execute kernel
    dim3 dimBlock(16, 16, 1);
    dim3 dimGrid(width / dimBlock.x, height / dimBlock.y, 1);
    createVertices<<<dimGrid, dimBlock>>>(positions, time, width, height);
    // Unmap buffer object
    cudaGraphicsUnmapResources(1, &positionsVBO_CUDA, 0);
    // Render from buffer object
    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
    glBindBuffer(GL_ARRAY_BUFFER, positionsVBO);
    glVertexPointer(4, GL_FLOAT, 0, 0);
    glEnableClientState(GL_VERTEX_ARRAY);
    glDrawArrays(GL_POINTS, 0, width * height);
    glDisableClientState(GL_VERTEX_ARRAY);
    // Swap buffers
    glutSwapBuffers();
    glutPostRedisplay();
}
void deleteVBO() {
    cudaGraphicsUnregisterResource(positionsVBO_CUDA);
    glDeleteBuffers(1, &positionsVBO);
}

__global__ void createVertices(float4* positions, float time, unsigned int width, unsigned int height) { 
    // [....]
}

有没有办法将 cudaMalloc 创建的内存空间直接提供给 OpenGL?我已经在 cuda 上编写了工作代码,我想将我的 float4 数组直接放入 OpenGL.

Is there a way to give the cudaMalloc created memory space directly to OpenGL? I've got already working code written on cuda and I want to put my float4 array directly into OpenGL.

说如果已经有类似的代码:

Say if've got already code like:

float4 *cd = (float4*) cudaMalloc(elements*sizeof(float4)). 
do_something<<<16,1>>>(cd);

我想通过 OpenGL 显示 do_something 的输出.

And I wanted to display the output of do_something through OpenGL.

旁注:为什么 cudaGraphicsResourceGetMappedPointer 函数在每个时间步都运行?

Side note: why is the cudaGraphicsResourceGetMappedPointer function run on every timestep?

推荐答案

从 CUDA 4.0 开始,OpenGL 互操作是单向的.这意味着做你想做的事(运行一个将数据写入 GL 缓冲区或纹理图像的 CUDA 内核),你必须将缓冲区映射到一个设备指针,并将该指针传递给你的内核,如你的示例所示.

As of CUDA 4.0, OpenGL interop is one-way. That means to do what you want (run a CUDA kernel that writes data to a GL buffer or texture image), you have to map the buffer to a device pointer, and pass that pointer to your kernel, as shown in your example.

至于您的旁注:每次调用 display() 时都会调用 cudaGraphicsResourceGetMappedPointer,因为每帧都会调用 cudaGraphicsMapResource.任何时候你重新映射一个资源,你都应该重新获取映射的指针,因为它可能已经改变了.为什么要重新映射每一帧?好吧,出于性能原因(尤其是在内存密集型 GL 应用程序中),OpenGL 有时会在内存中移动缓冲区对象.如果您始终保持资源映射,则无法执行此操作,并且性能可能会受到影响.我相信 GL 虚拟化内存对象的能力和需要也是当前 GL 互操作 API 是单向的原因之一(不允许 GL 移动 CUDA 分配,因此您无法映射 CUDA 分配的设备指针到一个 GL 缓冲区对象中).

As for your side note: cudaGraphicsResourceGetMappedPointer is called every time display() is called because cudaGraphicsMapResource is called every frame. Any time you re-map a resource you should re-get the mapped pointer, because it may have changed. Why re-map every frame? Well, OpenGL sometimes moves buffer objects around in memory, for performance reasons (especially in memory-intensive GL applications). If you leave the resource mapped all the time, it can't do this, and performance may suffer. I believe GL's ability and need to virtualize memory objects is also one of the reasons the current GL interop API is one-way (the GL is not allowed to move CUDA allocations around, and therefore you can't map a CUDA-allocated device pointer into a GL buffer object).

这篇关于Cuda 和 OpenGL 互操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆