Cuda和OpenGL Interop [英] Cuda and OpenGL Interop

查看:232
本文介绍了Cuda和OpenGL Interop的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在阅读CUDA文档,在我看来,需要与OpenGL接口的每个缓冲区需要在glBuffer中创建。



根据nvidia编程指南,这必须这样做:

  GLuint positionsVBO; 
struct cudaGraphicsResource * positionsVBO_CUDA;

int main(){

//显式设置设备
cudaGLSetGLDevice(0);
//初始化OpenGL和GLUT
...
glutDisplayFunc(display);
//创建缓冲区对象并用CUDA注册
glGenBuffers(1,positionsVBO);
glBindBuffer(GL_ARRAY_BUFFER,& vbo);
unsigned int size = width * height * 4 * sizeof(float);
glBufferData(GL_ARRAY_BUFFER,size,0,GL_DYNAMIC_DRAW);
glBindBuffer(GL_ARRAY_BUFFER,0);
cudaGraphicsGLRegisterBuffer(& positionsVBO_CUDA,positionsVBO,cudaGraphicsMapFlagsWriteDiscard);

//启动渲染循环
glutMainLoop();
}
void display(){
//从CUDA
float4 *位置写入映射缓冲区对象;
cudaGraphicsMapResources(1,& positionsVBO_CUDA,0);
size_t num_bytes;
cudaGraphicsResourceGetMappedPointer((void **)& positions,& num_bytes,positionsVBO_CUDA));
//执行内核
dim3 dimBlock(16,16,1);
dim3 dimGrid(width / dimBlock.x,height / dimBlock.y,1);
createVertices<<< dimGrid,dimBlock>>>>(位置,时间,宽度,高度);
// Unmap buffer object
cudaGraphicsUnmapResources(1,& positionsVBO_CUDA,0);
//从缓冲区对象转换
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glBindBuffer(GL_ARRAY_BUFFER,positionsVBO);
glVertexPointer(4,GL_FLOAT,0,0);
glEnableClientState(GL_VERTEX_ARRAY);
glDrawArrays(GL_POINTS,0,width * height);
glDisableClientState(GL_VERTEX_ARRAY);
//交换缓冲区
glutSwapBuffers();
glutPostRedisplay();
}
void deleteVBO(){
cudaGraphicsUnregisterResource(positionsVBO_CUDA);
glDeleteBuffers(1,& positionsVBO);
}

__global__ void createVertices(float4 * positions,float time,unsigned int width,unsigned int height){
// [....]
}

有没有办法让cudaMalloc直接创建内存空间给OpenGL?我已经工作的代码写在cuda和我想把我的float4数组直接到OpenGL。



说如果已经有代码:

  float4 * cd =(float4 *)cudaMalloc(elements * sizeof(float4))。 
do_something<<< 16,1>>>(cd);

我想通过OpenGL显示do_something的输出。



边注:为什么cudaGraphicsResourceGetMappedPointer函数在每个时间步上运行?

解决方案

interop是单向的。这意味着要做你想要的(运行一个CUDA内核将数据写入GL缓冲区或纹理图像),你必须将缓冲区映射到一个设备指针,并将该指针传递给你的内核,如你的例子所示。 / p>

对于你的旁注:cudaGraphicsResourceGetMappedPointer每次调用display()被调用,因为cudaGraphicsMapResource被称为每一帧。每次重新映射资源时,您都应该重新获取映射指针,因为它可能已更改。为什么要重新映射每一帧?那么,OpenGL有时会在内存中移动缓冲区对象,出于性能原因(特别是在内存密集型GL应用程序中)。如果你让资源映射所有的时间,它不能这样做,性能可能受损。我相信GL的能力和需要虚拟化内存对象也是当前GL互操作API是单向的原因之一(GL不允许移动CUDA分配,因此您不能映射CUDA分配的设备指针到GL缓冲区对象)。


I've been reading through the CUDA documentation and it seems to me, that every buffer that needs to interface with OpenGL needs to be created in the glBuffer.

According to the nvidia programming guide, this has to be done like this:

GLuint positionsVBO;
struct cudaGraphicsResource* positionsVBO_CUDA;

int main() {

    // Explicitly set device
    cudaGLSetGLDevice(0);
    // Initialize OpenGL and GLUT
    ...
    glutDisplayFunc(display);
    // Create buffer object and register it with CUDA
    glGenBuffers(1, positionsVBO);
    glBindBuffer(GL_ARRAY_BUFFER, &vbo);
    unsigned int size = width * height * 4 * sizeof(float);
    glBufferData(GL_ARRAY_BUFFER, size, 0, GL_DYNAMIC_DRAW);
    glBindBuffer(GL_ARRAY_BUFFER, 0);
    cudaGraphicsGLRegisterBuffer(&positionsVBO_CUDA, positionsVBO, cudaGraphicsMapFlagsWriteDiscard);

    // Launch rendering loop
    glutMainLoop();
}
void display() {
    // Map buffer object for writing from CUDA
    float4* positions;
    cudaGraphicsMapResources(1, &positionsVBO_CUDA, 0);
    size_t num_bytes;
    cudaGraphicsResourceGetMappedPointer((void**)&positions, &num_bytes, positionsVBO_CUDA));
    // Execute kernel
    dim3 dimBlock(16, 16, 1);
    dim3 dimGrid(width / dimBlock.x, height / dimBlock.y, 1);
    createVertices<<<dimGrid, dimBlock>>>(positions, time, width, height);
    // Unmap buffer object
    cudaGraphicsUnmapResources(1, &positionsVBO_CUDA, 0);
    // Render from buffer object
    glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
    glBindBuffer(GL_ARRAY_BUFFER, positionsVBO);
    glVertexPointer(4, GL_FLOAT, 0, 0);
    glEnableClientState(GL_VERTEX_ARRAY);
    glDrawArrays(GL_POINTS, 0, width * height);
    glDisableClientState(GL_VERTEX_ARRAY);
    // Swap buffers
    glutSwapBuffers();
    glutPostRedisplay();
}
void deleteVBO() {
    cudaGraphicsUnregisterResource(positionsVBO_CUDA);
    glDeleteBuffers(1, &positionsVBO);
}

__global__ void createVertices(float4* positions, float time, unsigned int width, unsigned int height) { 
    // [....]
}

Is there a way to give the cudaMalloc created memory space directly to OpenGL? I've got already working code written on cuda and I want to put my float4 array directly into OpenGL.

Say if've got already code like:

float4 *cd = (float4*) cudaMalloc(elements*sizeof(float4)). 
do_something<<<16,1>>>(cd);

And I wanted to display the output of do_something through OpenGL.

Side note: why is the cudaGraphicsResourceGetMappedPointer function run on every timestep?

解决方案

As of CUDA 4.0, OpenGL interop is one-way. That means to do what you want (run a CUDA kernel that writes data to a GL buffer or texture image), you have to map the buffer to a device pointer, and pass that pointer to your kernel, as shown in your example.

As for your side note: cudaGraphicsResourceGetMappedPointer is called every time display() is called because cudaGraphicsMapResource is called every frame. Any time you re-map a resource you should re-get the mapped pointer, because it may have changed. Why re-map every frame? Well, OpenGL sometimes moves buffer objects around in memory, for performance reasons (especially in memory-intensive GL applications). If you leave the resource mapped all the time, it can't do this, and performance may suffer. I believe GL's ability and need to virtualize memory objects is also one of the reasons the current GL interop API is one-way (the GL is not allowed to move CUDA allocations around, and therefore you can't map a CUDA-allocated device pointer into a GL buffer object).

这篇关于Cuda和OpenGL Interop的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆