视频处理帧间预测 [英] Video processing Inter-frame Prediction

查看:84
本文介绍了视频处理帧间预测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要对30个帧的集合执行帧间预测"和运动补偿",以便在Matlab中进行视频处理.我正在使用母女相框.

I need to perform 'Inter-frame Prediction' and 'Motion Compensation' of a set of 30 frames for video processing in Matlab. I am working with Mother-daughter frames.

到目前为止,我所做的是拍摄第一帧并将其分为

What I have done so far is to take the very first frame and divided it into

  • 8x8块
  • 执行DCT
  • 对其进行了量化
  • 消除它
  • 执行逆DCT.

我知道第一帧和第二帧以后都不需要运动估计,重建的帧一用作第二帧的参考,依此类推.对于运动估计,我需要实现全搜索块匹配算法"

I know that No motion estimation is required for the first frame and second frame onwards, the reconstructed frame one is used as reference for frame two and so on. For motion estimation I need to implement 'Full-search Block Matching Algorithm'

问题1 :重建框架意味着什么?我上面列出的是量化和DCT吗?

Question 1: What is meant by reconstruction of a frame? Is it quantization and DCT which I have listed above?

问题2 :什么是全搜索块匹配算法"?

Question 2: What is 'Full-search Block Matching Algorithm'?

推荐答案

我假设您是指视频压缩算法的MPEG联盟(MPEG-1,MPEG-2,H.264等). ).让我们一次回答每个问题:

I'm going to assume that you are referring to the MPEG consortium of video compression algorithms (MPEG-1, MPEG-2, H.264, etc.). Let's answer each question one at a time:

对于单个帧,正向变换基本上包括将一个帧分解为8 x 8个非重叠的块,对每个块进行8 x 8的DCT变换,对这些块进行量化,然后我们执行一些更复杂的操作,例如如之字形排序,游程长度编码等.

For a single frame, the forward transformation basically consists of decomposing a frame into 8 x 8 non-overlapping blocks, doing an 8 x 8 DCT transform of each block, quantizing the blocks, and then we perform some more complicated stuff such as zig-zag ordering, run-length encoding, etc.

基本上,您的帧表示为压缩的位序列.框架的重建以相反的顺序进行,因此您几乎正确.这包括重新构造序列和取消Z字形排序,然后对块进行反量化,然后应用IDCT.他们之所以称其为重构",是因为您将帧表示为其他格式.您正在将帧转换回压缩帧之前的状态.

Basically, your frame is represented as a compressed sequence of bits. A reconstruction of the frame is going in the reverse order, so you almost have it right. This consists of reconstructing the sequence and undoing the zig-zag ordering, then de-quantizing the block, then applying the IDCT. The reason why they call this "reconstruction" is because you represented the frame to be in a different format. You are converting the frame back to what it should have been before compressing the frame.

您可能已经知道的一件事是帧量化是该方法有损的原因.这意味着您将无法重新获得原始框架,但可以使其与原始框架尽可能接近.但是,这样做的好处是,使用有损算法,您可以获得较高的压缩率,这意味着视频的大小会更小,并且可以轻松传输.

One thing that you may already know is that quantization of the frame is the reason why this methodology is lossy. This means that you won't be able to get the original frame back, but you can get it to be as close as possible to the original. However, the advantage is that with lossy algorithms, you get high compression ratios, which means that the size of the video will be smaller, and can easily be transmitted.

实际上,如果对一帧进行正向变换,则进行反向变换.如果逐个像素地比较帧,您会发现存在一些细微的差异,但还不足以写出来.压缩工作原理背后的参数和设计已经过调整,因此普通人的视觉系统在事后看来不会注意到原始帧和重建帧之间的很大差异.

In fact, if you do a forward transformation of one frame, then do a reverse transformation. If you compare the frames pixel by pixel, you will see that there are some subtle differences, but not enough to write home about. The parameters and design behind how the compression works has been tuned so that the human visual system of an average person won't be able to notice much of the differences between the original and the reconstructed frame in hindsight.

那你为什么要问有损?之所以这样,是因为MPEG协会利用视频的高度可压缩性和可传输性来支持视频的实际质量.这是因为,即使您具有可以测量图像质量的数字量度(例如PSNR),质量也一直是主观的量度.

So why lossy you may ask? The reason why this is is because the MPEG consortium leveraged that the video should be highly compressible and transmittable in favour of the actual quality of the video. This is due to the fact that quality has always been a subjective measure, even when you have numerical measures (PSNR for instance) that can measure image quality.

所以,这个故事的寓意是,重构正在撤消为获得视频帧而进行的正向变换,但是它与原始帧不完全相同,但是足够接近以至于普通人不会抱怨.

So, the moral of this story is that a reconstruction is undoing the forward transformation performed to get the video frame to be compressed, but it will not exactly be the same as the original frame, but close enough that a normal human being won't complain.

运动估计背后的基本原理是,我们不希望将每个帧都作为完整个视频帧进行传输,以减少传输带宽.如果您了解MPEG视频压缩算法联盟的基础知识,则视频中就有三类编码的帧:

The basics behind motion estimation are that we don't want to transmit every frame as full video frames in order to reduce transmission bandwidth. If you know the basics of the MPEG consortium of video compression algorithms, there are three classes of encoded frames in your video:

  • I帧-这些就是所谓的帧内编码帧.这些帧具有对其执行的完整压缩算法(DCT,量化等).我们没有完全由I帧组成的视频,因为这会使视频的尺寸变得很大.取而代之的是,将I帧用作参考点,并在此点之后发送差异帧,其中,对于I帧中的每个块,运动矢量被传输.还有更多后续内容.

  • I-Frames - These are what are known as intracoded frames. These frames have the full compression algorithm performed on them (DCT, Quantization, etc.). We don't have a video that consists entirely of I-Frames as that would make the size of the video quite large. Instead, what is done is that I-frames are used as a reference point, and difference frames are sent after this point where for each block in an I-Frame, a motion vector is transmitted. More to follow.

P帧-我们发送预测帧或P帧,而不是发送另一个I帧.对于参考I帧中的每个块,P帧从本质上告诉我们该块从一个帧到另一个帧的最佳移动位置.这些就是每个块的运动矢量.其基本原理是视频通常以很高的帧速率捕获,连续的视频帧差异很小,因此大多数块应保持不变或移动很少.您将到达一个场景,视频中的场景将发生巨大变化,或者出现很多的高运动,即使帧率很高,您也无法充分捕捉到所有的运动仅适用于P帧.当您观看MPEG视频并且有很多高速运动时,通常会看到这种现象-您会看到很多块状",而块状就是用这个事实来解释的.因此,您需要将另一个I-Frame编码为快速刷新,然后从这一点继续.因此,大多数视频文件都对帧进行了编码,以使您拥有一个I帧,然后具有一堆P帧,然后具有另一个I帧,然后是一堆P帧,依此类推.

P-Frames - Instead of sending another I-Frame, we send a predicted frame or P-Frame instead. For each block from a reference I-Frame, the P-Frame essentially tells us where the block best moved from one frame to the next. These are what are known as motion vectors for each block. The rationale behind this is that video is usually captured at such a high frame rate, that successive video frames exhibit very little difference and so most of the blocks should remain the same, or move very little. You will get to a point where the scene will drastically change in the video, or that there is a lot of high motion that even with a high frame rate, you can't adequately capture all of the motion only with P-Frames. This is commonly seen when you're watching MPEG video and there is a lot of high motion - you'll see a lot of "blockiness", and that blockiness is explained by this fact. As such, you'll need to encode another I-Frame as a quick refresher and then continue from this point. As such, most video files have the frames encoded such that you have one I-Frame, then have a bunch of P-frames, then have another I-Frame followed by a bunch of P-Frames and so on.

B帧-这些就是所谓的双向预测帧.这些框架使用来自前面的一个或多个框架和来自后面的一个或多个框架的信息.这些功能的工作原理超出了本文的范围,但是我想简短地讨论一下这是自成体系的.

B-Frames - These are what are known as bi-directional predicted frames. These frames use information from both the frame (or frames) that are ahead and the frame (or frames) from behind. How these exactly work are beyond the scope of this post, but I wanted to talk about this briefly to be self-contained.

这样,一种可能的编码帧序列应遵循以下格式:

As such, one possible sequence of frames that are encoded follow the following format:

IPPPBPPPIPPPBPPPI...

但是,这一切都取决于编码器的设置方式,但是我们将把它放在一边.

However, this all depends on how your encoder is set up, but we'll leave that aside.

您可能会问这一切有用吗?原因是因为您的全搜索块匹配算法问题完全与P帧的构造方式有关.对于I型框架中的给定块,该块在下一帧中将移动到的最佳位置将在哪里?为此,我们实际上要看下一帧中的块,并找出与I帧中最相似的块.您可能会问自己一个问题:哇....没有很多可搜索的块吗?,答案是肯定的.完全搜索块匹配算法基本上是在整个帧中搜索最佳匹配块.这是相当耗费计算量的,因此大多数编码器实际上将搜索范围限制为围绕块位置的中等大小的有限窗口.全面搜索的块匹配可以为您带来最佳效果,但是耗时太长,绝对不值得.我们可以利用这样一个事实,即大多数视频块实际上并没有移动那么远,因为我们假设视频是用如此高的帧频捕获的.

How is all of this useful you might ask? The reason why is because your question of Full-search Block Matching Algorithm deals exactly with how P-frames are constructed. For a given block in an I-Frame, where would the best location that this block would have moved to in the next frame? To do this, we actually take a look at blocks in the next frame and figure out the most similar block with the one in the I-Frame. You are probably asking yourself this question: Woah.... aren't there a lot of blocks to search for? and the answer is yes. The Full-search Block Matching algorithm basically searches the entire frame for the best matching block. This is quite computationally intensive, and so most encoders actually limit the search to moderately sized finite window around the block's location. Full-search Block Matching would give you the best results, but takes too long, and definitely not worth it. We can leverage the fact that most blocks don't really move that far as we're assuming the video was captured with such a high frame rate.

我希望这能回答您的问题!

I hope this has answered your questions!

这篇关于视频处理帧间预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆