Mac上的OpenGL中的多线程视频渲染显示了严重的闪烁问题 [英] Multithreaded video rendering in OpenGL on Mac shows severe flickering issues

查看:285
本文介绍了Mac上的OpenGL中的多线程视频渲染显示了严重的闪烁问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个视频播放器应用程序,并使用多个线程来保持用户交互的顺畅度.

I have a video player application, and use multiple threads in order to keep the user interaction still smooth.

用于解码视频的线程最初只是将生成的帧作为BGRA写入RAM缓冲区中,而glTexSubImage2D已将其上传到VRAM,对于正常视频来说效果还不错,但是-如预期的那样-对于高清(esp 1920x1080)来说很慢.

The thread that decodes the video originally just wrote the resulting frames as BGRA into a RAM buffer, which got uploaded to VRAM by glTexSubImage2D which worked fine enough for normal videos, but -as expected- got slow for HD (esp 1920x1080).

为了改善这一点,我实现了另一种类型的池类,该池类具有自己的GL上下文(在Mac上为NSOpenGLContext,它与主上下文共享资源). 此外,我更改了代码,使其使用

In order to improve that I implemented a different kind of pool class which has its own GL context (NSOpenGLContext as I am on Mac), which shares the resources with the main context. Furthermore I changed the code so that it uses

glTextureRangeAPPLE( GL_TEXTURE_RECTANGLE_ARB, m_mappedMemSize, m_mappedMem );

glTexParameteri(GL_TEXTURE_RECTANGLE_ARB, GL_TEXTURE_STORAGE_HINT_APPLE, GL_STORAGE_SHARED_APPLE);

针对我使用的纹理,以提高上载到VRAM的性能. 而不是上传BGRA纹理(对于1920x1080,其每帧重约为8MB),我分别为Y,U和V上传了三个单独的纹理(每个都是GL_LUMINANCE,GL_UNSIGNED_BYTE和原始大小的Y纹理,而U和V为一半尺寸),从而将上传的大小减小到大约3 MB,这已经显示出一些改进.

for the textures that I use, in order to improve the performance of uploading to VRAM. Instead of uploading BGRA textures (which weigh in at about 8MB per frame for 1920x1080) I upload three individual textures for Y, U and V (each being GL_LUMINANCE, GL_UNSIGNED_BYTE, and the Y texture of original size, and the U and V at half the dimensions), thereby reducing the size being uploaded to about 3 MB, which already showed some improvement.

我创建了一个YUV纹理池(取决于视频的大小,通常在3到8个表面之间(乘以Y,U和V分量的时间乘以3))-每个纹理都映射到自己的区域上面的m_mappedMem.

I created a pool of those YUV textures (depending on the size of the video it typically ranges between 3 and 8 surfaces (times three as it is Y, U and V components) - each of the textures mapped into its own area of the above m_mappedMem.

当我收到新解码的视频帧时,我发现了一组免费的YUV曲面,并使用此代码更新了三个分量:

When I receive a newly decoded video frame I find a set of free YUV surfaces and update the three components each with this code:

glActiveTexture(m_textureUnits[texUnit]);
glEnable(GL_TEXTURE_RECTANGLE_ARB);

glBindTexture(GL_TEXTURE_RECTANGLE_ARB, planeInfo->m_texHandle);

glTexParameteri(GL_TEXTURE_RECTANGLE_ARB, GL_TEXTURE_STORAGE_HINT_APPLE, GL_STORAGE_SHARED_APPLE);
glPixelStorei(GL_UNPACK_CLIENT_STORAGE_APPLE, GL_TRUE);

memcpy( planeInfo->m_buffer, srcData, planeInfo->m_planeSize );

glTexSubImage2D( GL_TEXTURE_RECTANGLE_ARB, 
                0, 
                0, 
                0, 
                planeInfo->m_width, 
                planeInfo->m_height, 
                GL_LUMINANCE, 
                GL_UNSIGNED_BYTE, 
                planeInfo->m_buffer );

(作为附带的问题:我不确定是否应该为每个纹理使用不同的纹理单位?[我对Y使用单位0,对U使用1单位,对v btw使用2)]

(As a side question: I am not sure if for each of the textures I should use a different texture unit? [I am using unit 0 for Y, 1 for U and 2 for V btw])

完成此操作后,我将使用的纹理标记为已使用,并且VideoFrame类填充了它们的信息(例如,纹理编号以及它们在缓冲区中占据的区域等),并放入队列中被渲染.一旦达到最小队列大小,就会通知主应用程序可以开始渲染视频.

Once this is done I put the textures that I used are marked as being used and a VideoFrame class is filled with their info (ie the texture number, and which area in the buffer they occupy etc) and put into a queue to be rendered. Once the minimum queue size is reached, the main application is notified that it can start to render the video.

同时,主渲染线程(在确保正确的状态等之后)然后访问此队列(该队列类的访问受到内部互斥量的保护)并渲染顶部帧.

The main rendering thread meanwhile (after ensuring correct state, etc) then accesses this queue (that queue class has its access internally protected by a mutex) and renders the top frame.

该主渲染线程具有两个帧缓冲区,并通过glFramebufferTexture2D与它们关联两个纹理,以实现某种形式的双重缓冲. 在主渲染循环中,然后检查哪个是前缓冲区,然后使用纹理单元0将此前缓冲区渲染到屏幕:

That main rendering thread has two framebuffers, and associated to them via glFramebufferTexture2D two textures, in order to implement some kind of double buffering. In the main rendering loop it then checks which one is the front buffer, and then renders this front buffer to the screen using texture unit 0:

glActiveTexture(GL_TEXTURE0);
glEnable(GL_TEXTURE_RECTANGLE_ARB);            
glBindTexture(GL_TEXTURE_RECTANGLE_ARB, frontTexHandle);            
glTexEnvi(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_REPLACE);

glPushClientAttrib( GL_CLIENT_VERTEX_ARRAY_BIT );
glEnableClientState( GL_VERTEX_ARRAY );
glEnableClientState( GL_TEXTURE_COORD_ARRAY );            
glBindBuffer(GL_ARRAY_BUFFER, m_vertexBuffer);
glVertexPointer(4, GL_FLOAT, 0, 0);
glBindBuffer(GL_ARRAY_BUFFER, m_texCoordBuffer);
glTexCoordPointer(2, GL_FLOAT, 0, 0);
glDrawArrays(GL_QUADS, 0, 4);
glPopClientAttrib();

在渲染到当前帧的屏幕之前(由于视频的通常帧速率约为24 fps,因此在渲染下一个视频帧之前可能会渲染几次该帧,这就是我使用此方法的原因)视频解码器类,以检查是否有新帧可用(即,它负责同步到时间轴并使用新帧更新后缓冲),如果有帧可用,那么我正在从videodecoder内部渲染到后缓冲纹理类(与主渲染线程在同一线程上发生):

Before doing that rendering to the screen of the current frame (as the usual framerate is about 24 fps for videos, this frame might be rendered a few times before the next videoframe gets rendered - that's why I use this approach) I call the video decoder class to check if a new frame is available (ie it is responsible for syncing to the timeline and updating the backbuffer with a new frame), if a frame is available, then I am rendering to the backbuffer texture from inside the videodecoder class (this happens on the same thread as the main rendering thread):

glBindFramebuffer(GL_FRAMEBUFFER, backbufferFBOHandle);

glPushAttrib(GL_VIEWPORT_BIT);    // need to set viewport all the time?
glViewport(0,0,m_surfaceWidth,m_surfaceHeight);

glMatrixMode(GL_MODELVIEW);
glPushMatrix();
glLoadIdentity();
glMatrixMode(GL_PROJECTION);
glPushMatrix();
glLoadIdentity();
glMatrixMode(GL_TEXTURE);
glPushMatrix();
glLoadIdentity();
glScalef( (GLfloat)m_surfaceWidth, (GLfloat)m_surfaceHeight, 1.0f );

glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_RECTANGLE_ARB, texID_Y);

glActiveTexture(GL_TEXTURE1);
glBindTexture(GL_TEXTURE_RECTANGLE_ARB, texID_U);

glActiveTexture(GL_TEXTURE2);
glBindTexture(GL_TEXTURE_RECTANGLE_ARB, texID_V);

glUseProgram(m_yuv2rgbShader->GetProgram());

glBindBuffer(GL_ARRAY_BUFFER, m_vertexBuffer);
glEnableVertexAttribArray(m_attributePos);
glVertexAttribPointer(m_attributePos, 4, GL_FLOAT, GL_FALSE, 0, 0);
glBindBuffer(GL_ARRAY_BUFFER, m_texCoordBuffer);
glEnableVertexAttribArray(m_attributeTexCoord);
glVertexAttribPointer(m_attributeTexCoord, 2, GL_FLOAT, GL_FALSE, 0, 0);
glDrawArrays(GL_QUADS, 0, 4);

glUseProgram(0);

glBindTexture(GL_TEXTURE_RECTANGLE_ARB, 0);                

glActiveTexture(GL_TEXTURE1);
glBindTexture(GL_TEXTURE_RECTANGLE_ARB, 0);

glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_RECTANGLE_ARB, 0);

glPopMatrix();
glMatrixMode(GL_PROJECTION);
glPopMatrix();
glMatrixMode(GL_MODELVIEW);
glPopMatrix();                            

glPopAttrib();
glBindFramebuffer(GL_FRAMEBUFFER, 0);

[请注意,为简洁起见,我省略了某些安全检查和评论]

[Please note that I omitted certain safety checks and comments for brevity]

在上述调用之后,视频解码器设置了一个可以交换缓冲区的标志,并且从上面的主线程渲染循环之后,它检查该标志并相应地设置了frontBuffer/backBuffer.此外,已使用的表面也被标记为空闲并且可以再次使用.

After the above calls the video decoder sets a flag that the buffer can be swapped, and after the mainthread rendering loop from above, it checks for that flag and sets the frontBuffer/backBuffer accordingly. Also the used surfaces are marked as being free and available again.

在我的原始代码中,当我使用BGRA并通过glTexSubImage2D和glBegin和glEnd上传时,我没有遇到任何问题,但是一旦开始改进,使用着色器将YUV组件转换为BGRA以及进行DMA传输,并且glDrawArrays这些问题开始出现.

In my original code when I used BGRA and uploads by glTexSubImage2D and glBegin and glEnd, I didn't experience any problems, but once I started improving things, using a shader to convert the YUV components to BGRA, and those DMA transfers, and glDrawArrays those issues started showing up.

基本上看起来像是一种撕裂效果(顺便说一句,我将GL交换间隔设置为1以与刷新同步),并且部分像是在两者之间跳了几帧.

Basically it seems partially like a tearing effect (btw I set the GL swap interval to 1 to sync with refreshes), and partially like it is jumping back a few frames in between.

我希望拥有一个我要渲染的表面池,并在渲染到目标表面后将其释放,并对该目标表面进行双缓冲就足够了,但是显然在其他地方需要进行更多的同步-但是我真的不知道该怎么解决.

I expected that having a pool of surfaces which I render to, and which get freed after rendering to the target surface, and double buffering that target surface should be enough, but obviously there needs to be more synchronization done in other places - however I don't really know how to solve that.

我认为,由于glTexSubImage2D现在由DMA处理(根据文档应该立即返回的功能),上传可能还没有完成(下一帧正在渲染),或者我忘记了(或不知道)我需要用于OpenGL(Mac)的其他一些同步机制.

I assume that because glTexSubImage2D is now handled by DMA (and the function according to the documents supposed to return immediately) that the uploading might not be finished yet (and the next frame is rendering over it), or that I forgot (or don't know) about some other synchronization mechanism which I need for OpenGL (Mac).

在我开始优化代码之前,根据OpenGL探查器:

According to OpenGL profiler before I started optimizing the code:

  • glTexSubImage2D中几乎70%的GLTime(即将8MB BGRA上载到VRAM)
  • CGLFlushDrawable中几乎30%

在我将代码更改为上面的代码之后,它现在显示:

and after I changed the code to the above it now says:

  • glTexSubImage2D中的GLTime约为4%(因此DMA似乎运行良好)
  • GLCFlushDrawable中的16%
  • glDrawArrays中几乎有75%(这让我感到非常惊讶)

对这些结果有何评论?

Any comments on those results?

如果您需要有关如何设置我的代码的更多信息,请告诉我.我们将不胜感激如何解决此问题的提示.

If you need any further info about how my code is set up, please let me know. Hints on how to solve this would be much appreciated.

这是我的着色器供参考

#version 110
attribute vec2 texCoord;
attribute vec4 position;

// the tex coords for the fragment shader
varying vec2 texCoordY;
varying vec2 texCoordUV;

//the shader entry point is the main method
void main()
{   
    texCoordY = texCoord ;
    texCoordUV = texCoordY * 0.5;
    gl_Position = gl_ModelViewProjectionMatrix * position;
}

和片段:

#version 110

uniform sampler2DRect texY;
uniform sampler2DRect texU;
uniform sampler2DRect texV;

// the incoming tex coord for this vertex
varying vec2 texCoordY;
varying vec2 texCoordUV;

// RGB coefficients
const vec3 R_cf = vec3(1.164383,  0.000000,  1.596027);
const vec3 G_cf = vec3(1.164383, -0.391762, -0.812968);
const vec3 B_cf = vec3(1.164383,  2.017232,  0.000000);

// YUV offset
const vec3 offset = vec3(-0.0625, -0.5, -0.5);

void main()
{
    // get the YUV values
    vec3 yuv;
    yuv.x = texture2DRect(texY, texCoordY).r;
    yuv.y = texture2DRect(texU, texCoordUV).r;
    yuv.z = texture2DRect(texV, texCoordUV).r;
    yuv += offset;

    // set up the rgb result
    vec3 rgb;

    // YUV to RGB transform
    rgb.r = dot(yuv, R_cf);
    rgb.g = dot(yuv, G_cf);
    rgb.b = dot(yuv, B_cf);

    gl_FragColor = vec4(rgb, 1.0);
}

作为一个附带说明,我还有另一个渲染管道,该管道使用VDADecoder对象进行解码,该对象在性能方面非常出色,但是存在相同的闪烁问题.因此,我的代码中的线程肯定存在一些问题-到目前为止,我只是无法弄清楚到底是什么.但是我还需要为那些不支持VDA的机器提供软件解码器解决方案,因此CPU负载很高,因此我尝试将YUV到RGB的转换卸载到GPU

Edit 2: As a side-note, I have another rendering pipeline which uses a VDADecoder object for decoding, which works super-nicely performance-wise, but has the same flickering issues. So there is definitely some problem with the threading in my code - so far I just couldn't figure out what exactly. But I also need to provide a software decoder solution for those machines that don't support VDA, hence the CPU load is quite high and therefore I tried to unload the YUV to RGB conversion to the GPU

推荐答案

好吧,经过大量的测试和研究,我终于设法解决了我的问题:

Ok, after a lot more testing and research, I finally managed to solve my problems:

发生的事情是,首先我尝试使用帧缓冲区(使用glFramebufferTexture2D作为颜色附件0绑定到该纹理)写入目标纹理,然后在同一帧中尝试将帧渲染到窗口时从中读取帧缓冲.

What happened was that first I tried to write to the target texture using a framebuffer (bound to that texture with glFramebufferTexture2D as color attachment 0), and in the same frame then tried to read from it when rendering the frame to the window framebuffer.

基本上,我错误地假设(在同一帧中调用,并且彼此直接连续调用)第一个调用将完成对帧缓冲区的写入,而下一个调用将从中读取.因此,调用glFlush(对于使用VDADecoder的类)和glFinish(对于使用软件解码器的类)都可以解决问题.

Basically I wrongly assumed that (being called in the same frame, and directly in succession of each other) the first call would finish writing to the framebuffer, before the next call would read from it. Therefore a call to glFlush (for the class using the VDADecoder) and a glFinish (for the class using the software decoder) did the trick.

在旁注:如上面的注释所示,我更改了整个代码,因此不再使用固定的管道,并使它看起来更整洁.在OpenGL Profiler(在Mac OS X 10.7下)下的性能测试表明,从原始代码到当前代码的更改已将OpenGL在整个应用程序中使用的时间从近50%减少到了约15%(释放了更多资源)用于实际的视频解码-如果没有VDADecoder对象).

On a sidenote: As indicated in the comments above, I changed my whole code so it doesn't user the fixed pipeline anymore, and to make it look cleaner. Performance tests under OpenGL Profiler (under Mac OS X 10.7) have shown that the changes from my original code to the current code have reduced the time OpenGL used of the total application time from almost 50% down to about 15% (freeing up more resources for the actual video decoding - in the case a VDADecoder object is not available).

这篇关于Mac上的OpenGL中的多线程视频渲染显示了严重的闪烁问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆