ffmpeg 中的比特流过滤器是什么? [英] What are bitstream filters in ffmpeg?

查看:27
本文介绍了ffmpeg 中的比特流过滤器是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

仔细阅读FFmpeg Bitstream Filters Documentation,还是没看懂它们的真正用途是什么.

After careful reading of FFmpeg Bitstream Filters Documentation, I still do not understand what they are really for.

文档说明过滤器:

在不执行解码的情况下执行比特流级修改

谁能再给我解释一下?用例将极大地澄清事情.此外,还有明显不同的过滤器.它们有何不同?

Could anyone further explain that to me? A use case would greatly clarify things. Also, there are clearly different filters. How do they differ?

推荐答案

让我举例说明.FFmpeg 视频解码器通常通过将每次调用的一个视频帧转换为 avcodec_decode_video2 来工作.所以输入应该是一张图像"的比特流数据.让我们考虑一下从文件(磁盘字节数组)到图像的问题.

Let me explain by example. FFmpeg video decoders typically work by converting one video frame per call to avcodec_decode_video2. So the input is expected to be "one image" worth of bitstream data. Let's consider this issue of going from a file (an array of bytes of disk) to images for a second.

对于原始"(annexb) H264(.h264/.bin/.264 文件),单个 nal 单元数据(sps/pps 标头比特流或 cabac 编码的帧数据)连接在一系列 nal 单元中,中间有一个起始代码 (00 00 01 XX),其中 XX 是最终的单位类型.(为了防止nal数据本身有00 00 01数据,是RBSP转义的.)所以一个h264 帧解析器 可以简单地在起始代码标记处剪切文件.他们搜索以 00 00 01 开头并包括的连续数据包,直到并排除下一次出现的 00 00 01.然后他们解析 nal 单元类型和切片头以找到每个数据包属于哪个帧,并返回一组 nal组成一帧的单元作为 h264 解码器的输入.

For "raw" (annexb) H264 (.h264/.bin/.264 files), the individual nal unit data (sps/pps header bitstreams or cabac-encoded frame data) is concatenated in a sequence of nal units, with a start code (00 00 01 XX) in between, where XX is the nal unit type. (In order to prevent the nal data itself to have 00 00 01 data, it is RBSP escaped.) So a h264 frame parser can simply cut the file at start code markers. They search for successive packets that start with and including 00 00 01, until and excluding the next occurence of 00 00 01. Then they parse the nal unit type and slice header to find which frame each packet belongs to, and return a set of nal units making up one frame as input to the h264 decoder.

.mp4 文件中的 H264 数据是不同的.您可以想象,如果多路复用格式中已经包含长度标记,则 00 00 01 起始代码可以被认为是多余的,就像 mp4 的情况一样.因此,为了每帧节省 3 个字节,他们删除了 00 00 01 前缀.他们还将 PPS/SPS 放在文件头中,而不是将其放在第一帧之前,而且它们也缺少 00 00 01 前缀.因此,如果我将其输入到 h264 解码器中,该解码器需要所有 nal 单元的前缀,它将无法工作.h264_mp4toannexb 比特流过滤器解决了这个问题,通过识别提取文件头的一部分(ffmpeg 称其为extradata"),将起始码和各个帧数据包中的每个 nal 放在前面,然后将它们重新连接在一起,然后再将它们输入到 h264 解码器中.

H264 data in .mp4 files is different, though. You can imagine that the 00 00 01 start code can be considered redundant if the muxing format already has length markers in it, as is the case for mp4. So, to save 3 bytes per frame, they remove the 00 00 01 prefix. They also put the PPS/SPS in the file header instead of prepending it before the first frame, and these also miss their 00 00 01 prefixes. So, if I were to input this into the h264 decoder, which expects the prefixes for all nal units, it wouldn't work. The h264_mp4toannexb bitstream filter fixes this, by identifying the pps/sps in the extracted parts of the file header (ffmpeg calls this "extradata"), prepending this and each nal from individual frame packets with the start code, and concatenating them back together before inputting them in the h264 decoder.

您现在可能会觉得解析器"和比特流过滤器"之间有非常细微的区别.这是真实的.我认为官方的定义是解析器获取一系列输入数据并将其拆分为帧,而不丢弃任何数据或添加任何数据.解析器所做的唯一一件事就是改变数据包边界.另一方面,允许比特流过滤器实际修改数据.我不确定这个定义是否完全正确(参见例如下面的 vp9),但这是 mp4toannexb 是 BSF 而不是解析器的概念原因(因为它添加了 00 00 01 前缀).

You might now feel that there's a very fine line distinction between a "parser" and a "bitstream filter". This is true. I think the official definition is that a parser takes a sequence of input data and splits it in frames without discarding any data or adding any data. The only thing a parser does is change packet boundaries. A bitstream filter, on the other hand, is allowed to actually modify the data. I'm not sure this definition is entirely true (see e.g. vp9 below), but it's the conceptual reason mp4toannexb is a BSF, not a parser (because it adds 00 00 01 prefixes).

这种比特流调整"有助于保持解码器简单和统一的其他情况,但允许我们支持碰巧存在的所有文件变体:

Other cases where such "bitstream tweaks" help keep decoders simple and uniform, but allow us to support all files variants that happen to exist in the wild:

  • mpeg4 (divx) b 帧解包(获得 B-像 IBP 这样的帧序列,它被编码为 IPB,在 AVI 中并获得正确的时间戳,人们提出了 B 帧打包的概念,其中 IBP/IPB 打包在帧中作为 I-(PB)-(),即第三个数据包为空,第二个数据包有两帧.这意味着在解码阶段与P和B帧相关的时间戳是正确的.这也意味着你有一个数据包的两帧输入数据,这违反了ffmpeg的一帧一帧输出概念,所以我们写了一个bsf 将数据包分成两部分——同时删除表示数据包包含两个帧的标记,因此是 BSF 而不是解析器——在将其输入解码器之前.实际上,这解决了帧多线程的其他难题.VP9 做同样的事情(称为超帧),但在 parser 中分割帧,所以解析器/BSF 拆分在理论上并不总是完美的;也许 VP9 应该被称为 BSF)
  • hevc mp4 到 Annexb 的转换(同上,但适用于 hevc)
  • aac adts to asc 转换(这与h264/hevc 附件 b 与 mp4,但适用于 aac 音频)
  • mpeg4 (divx) b frame unpacking (to get B-frames sequences like IBP, which are coded as IPB, in AVI and get timestamps correct, people came up with this concept of B-frame packing where I-B-P / I-P-B is packed in frames as I-(PB)-(), i.e. the third packet is empty and the second has two frames. This means the timestamp associated with the P and B frame at the decoding phase is correct. It also means you have two frames worth of input data for one packet, which violates ffmpeg's one-frame-in-one-frame-out concept, so we wrote a bsf to split the packet back in two - along with deleting the marker that says that the packet contains two frames, hence a BSF and not a parser - before inputting it into the decoder. In practice, this solves otherwise hard problems with frame multithreading. VP9 does the same thing (called superframes), but splits frames in the parser, so the parser/BSF split isn't always theoretically perfect; maybe VP9's should be called a BSF)
  • hevc mp4 to annexb conversion (same story as above, but for hevc)
  • aac adts to asc conversion (this is basically the same as h264/hevc annexb vs. mp4, but for aac audio)

这篇关于ffmpeg 中的比特流过滤器是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆