什么是ffmpeg中的比特流筛选器? [英] What are bitstream filters in ffmpeg?

查看:463
本文介绍了什么是ffmpeg中的比特流筛选器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

仔细阅读 FFmpeg位流过滤器文档后,我还是不了解他们的真实情况。

After careful reading of FFmpeg Bitstream Filters Documentation, I still do not understand what they are really for.

该文档说明了过滤器:


执行比特流级别修改而不执行解码

有人可以进一步向我解释吗?一个用例将大大的澄清事情。而且,显然有不同的过滤器。他们有什么不同?

Could anyone further explain that to me? A use case would greatly clarify things. Also, there are clearly different filters. How do they differ?

推荐答案

让我举例说明。 FFmpeg视频解码器通常通过将每个呼叫的一个视频帧转换为avcodec_decode_video2来工作。因此,输入预期是比特流数据的一个图像。让我们考虑一下从文件(磁盘字节数组)到图像一秒的问题。

Let me explain by example. FFmpeg video decoders typically work by converting one video frame per call to avcodec_decode_video2. So the input is expected to be "one image" worth of bitstream data. Let's consider this issue of going from a file (an array of bytes of disk) to images for a second.

对于raw(附件)H264(.h264 / .bin / .264文件),单个nal单元数据(sps / pps标题比特流或cabac编码的帧数据)以nal单元的顺序连接,其间有起始码(00 00 01 XX),其中XX是nal单位类型。 (为了防止数据本身具有00 00 01数据,它是RBSP转义的)。所以一个 h264帧解析器可以在起始代码标记中简单地剪切文件。它们搜索从00 00 01开始并包括00 00 01的连续包,直到并排除下一次00 00 01的发生。然后,它们解析nal单元类型和片头,以查找每个包所属的帧,并返回一组nal单位组成一帧作为 h264解码器的输入。

For "raw" (annexb) H264 (.h264/.bin/.264 files), the individual nal unit data (sps/pps header bitstreams or cabac-encoded frame data) is concatenated in a sequence of nal units, with a start code (00 00 01 XX) in between, where XX is the nal unit type. (In order to prevent the nal data itself to have 00 00 01 data, it is RBSP escaped.) So a h264 frame parser can simply cut the file at start code markers. They search for successive packets that start with and including 00 00 01, until and excluding the next occurence of 00 00 01. Then they parse the nal unit type and slice header to find which frame each packet belongs to, and return a set of nal units making up one frame as input to the h264 decoder.

.mp4文件中的H264数据是不同的。您可以想像,00 00 01起始码可以被认为是多余的,如果多路复用格式已经有长度标记,就像mp4的情况一样。因此,为了保存每帧3个字节,它们删除00 00 01前缀。他们还将PPS / SPS放在文件头中,而不是在第一帧之前放置,而这些也错过了00 00 01的前缀。所以,如果我把这个输入到h264解码器,这个解码器期望所有nal单元的前缀,它将不起作用。 h264_mp4toannexb 比特流筛选器可以通过识别pts / sps提取的文件头部分(ffmpeg调用这个extradata),从具有起始码的各个帧数据包中提前和每个nal,并将它们连接在一起,然后再在h264解码器中输入。

H264 data in .mp4 files is different, though. You can imagine that the 00 00 01 start code can be considered redundant if the muxing format already has length markers in it, as is the case for mp4. So, to save 3 bytes per frame, they remove the 00 00 01 prefix. They also put the PPS/SPS in the file header instead of prepending it before the first frame, and these also miss their 00 00 01 prefixes. So, if I were to input this into the h264 decoder, which expects the prefixes for all nal units, it wouldn't work. The h264_mp4toannexb bitstream filter fixes this, by identifying the pps/sps in the extracted parts of the file header (ffmpeg calls this "extradata"), prepending this and each nal from individual frame packets with the start code, and concatenating them back together before inputting them in the h264 decoder.

您现在可以感觉到解析器和比特流过滤器之间有很细微的区别。这是真的。我认为官方的定义是,解析器需要一系列输入数据,并将其分割成帧,而不丢弃任何数据或添加任何数据。解析器唯一做的就是改变数据包的边界。另一方面,允许比特流滤波器实际修改数据。我不知道这个定义是完全正确的(参见下面的vp9),但它的概念是mp4toannexb是一个BSF,而不是解析器(因为它添加了00 00 01的前缀)。

You might now feel that there's a very fine line distinction between a "parser" and a "bitstream filter". This is true. I think the official definition is that a parser takes a sequence of input data and splits it in frames without discarding any data or adding any data. The only thing a parser does is change packet boundaries. A bitstream filter, on the other hand, is allowed to actually modify the data. I'm not sure this definition is entirely true (see e.g. vp9 below), but it's the conceptual reason mp4toannexb is a BSF, not a parser (because it adds 00 00 01 prefixes).

其他情况下,这种比特流调整有助于保持解码器简单而统一,但允许我们支持所有存在于野外的文件变体:

Other cases where such "bitstream tweaks" help keep decoders simple and uniform, but allow us to support all files variants that happen to exist in the wild:


  • mpeg4(divx) b框架拆包 (为了得到像IBP这样被编码为IPB的B帧序列,在AVI中得到时间戳,人们想出了这样的B帧包装的概念,其中IBP / IPB以帧为单位包装为 I-(PB) - (),即第三个数据包为空,第二个数据包为二这意味着在解码阶段与P和B帧相关的时间戳是正确的,也意味着你有两个f对于一个数据包的rames值的输入数据,这违反了ffmpeg的一帧一帧结构概念,所以我们写了一个bsf将数据包分成两部分,以及删除标记,表示数据包包含两个帧,因此是BSF而不是解析器,然后将其输入到解码器。实际上,这解决了框架多线程的其他难题。 VP9做同样的事情(称为超帧),但在解析器中分割帧,所以解析器/ BSF分割在理论上并不总是完美的;也许VP9应该被称为BSF)

  • hevc mp4到annexb转换(与上述相同的故事,但是对于hevc)

  • aac adts to asc 转换(这与h264 / hevc annexb vs. mp4基本相同,但是对于aac音频)

  • mpeg4 (divx) b frame unpacking (to get B-frames sequences like IBP, which are coded as IPB, in AVI and get timestamps correct, people came up with this concept of B-frame packing where I-B-P / I-P-B is packed in frames as I-(PB)-(), i.e. the third packet is empty and the second has two frames. This means the timestamp associated with the P and B frame at the decoding phase is correct. It also means you have two frames worth of input data for one packet, which violates ffmpeg's one-frame-in-one-frame-out concept, so we wrote a bsf to split the packet back in two - along with deleting the marker that says that the packet contains two frames, hence a BSF and not a parser - before inputting it into the decoder. In practice, this solves otherwise hard problems with frame multithreading. VP9 does the same thing (called superframes), but splits frames in the parser, so the parser/BSF split isn't always theoretically perfect; maybe VP9's should be called a BSF)
  • hevc mp4 to annexb conversion (same story as above, but for hevc)
  • aac adts to asc conversion (this is basically the same as h264/hevc annexb vs. mp4, but for aac audio)

这篇关于什么是ffmpeg中的比特流筛选器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆