H.264 流的序列/图片参数集的可能位置 [英] Possible Locations for Sequence/Picture Parameter Set(s) for H.264 Stream

查看:36
本文介绍了H.264 流的序列/图片参数集的可能位置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究 H.264 解码器,想知道在哪里可以找到 SPS 和 PPS.我的参考文献告诉我,这些是在 H.264-Stream 中编码的 NAL 单元,但是当我使用 IsoViewer 查看示例 MP4 文件时,它说 SPS 和 PPS 在 avcC 框中.

I'm working on a H.264 Decoder and I'm wondering where to find the SPS and PPS. My reference literature tells me that those are NAL Units encoded in the H.264-Stream, but when I look into an example-MP4-File with IsoViewer, it says that the SPS and PPS are in the avcC Box.

这究竟是如何工作的?它如何查找 .mkv 文件或其他 H.264 容器?

How exactly does this work? How does it look for .mkv files or other H.264 containers?

推荐答案

首先,重要的是要了解没有单一的标准 H.264 基本比特流格式.规范文件确实包含一个附件,特别是附件 B,它描述了一种可能的格式,但这不是实际要求.该标准指定了如何将视频编码为单个数据包.这些数据包的存储和传输方式由集成商决定.

First off, it's important to understand that there is no single standard H.264 elementary bitstream format. The specification document does contain an Annex, specifically Annex B, that describes one possible format, but it is not an actual requirement. The standard specifies how video is encoded into individual packets. How these packets are stored and transmitted is left open to the integrator.

这些数据包称为网络抽象层单元.通常缩写为 NALU(或有时简称为 NAL),每个数据包都可以单独解析和处理.每个 NALU 的第一个字节包含 NALU 类型,特别是第 3 位到第 7 位.(第 0 位总是关闭,第 1-2 位表示一个 NALU 是否被另一个 NALU 引用).

The packets are called Network Abstraction Layer Units. Often abbreviated NALU (or sometimes just NAL) each packet can be individually parsed and processed. The first byte of each NALU contains the NALU type, specifically bits 3 through 7. (bit 0 is always off, and bits 1-2 indicate whether a NALU is referenced by another NALU).

定义了 19 种不同的 NALU 类型,分为两类,VCL 和非 VCL:

There are 19 different NALU types defined separated into two categories, VCL and non-VCL:

  • VCL 或视频编码层数据包包含实际的视觉信息.
  • 非 VCL 包含可能需要也可能不需要解码视频的元数据.

单个 NALU,甚至是 VCL NALU 与框架不同.一个帧可以被切片"成几个 NALU.就像你可以切片披萨一样.然后将一个或多个切片虚拟分组为包含一帧的访问单元 (AU).切片的质量成本很低,因此不常使用.

A single NALU, or even a VCL NALU is NOT the same thing as a frame. A frame can be ‘sliced’ into several NALUs. Just like you can slice a pizza. One or more slices are then virtually grouped into a Access Units (AU) that contain one frame. Slicing does come at a slight quality cost, so it is not often used.

下表列出了所有已定义的 NALU.

Below is a table of all defined NALUs.

0      Unspecified                                                    non-VCL
1      Coded slice of a non-IDR picture                               VCL
2      Coded slice data partition A                                   VCL
3      Coded slice data partition B                                   VCL
4      Coded slice data partition C                                   VCL
5      Coded slice of an IDR picture                                  VCL
6      Supplemental enhancement information (SEI)                     non-VCL
7      Sequence parameter set                                         non-VCL
8      Picture parameter set                                          non-VCL
9      Access unit delimiter                                          non-VCL
10     End of sequence                                                non-VCL
11     End of stream                                                  non-VCL
12     Filler data                                                    non-VCL
13     Sequence parameter set extension                               non-VCL
14     Prefix NAL unit                                                non-VCL
15     Subset sequence parameter set                                  non-VCL
16     Depth parameter set                                            non-VCL
17..18 Reserved                                                       non-VCL
19     Coded slice of an auxiliary coded picture without partitioning non-VCL
20     Coded slice extension                                          non-VCL
21     Coded slice extension for depth view components                non-VCL
22..23 Reserved                                                       non-VCL
24..31 Unspecified                                                    non-VCL

有几种 NALU 类型,以后了解可能会有所帮助.

There are a couple of NALU types where having knowledge of may be helpful later.

  • 序列参数集 (SPS).此非 VCL NALU 包含配置解码器所需的信息,例如配置文件、级别、分辨率、帧速率.
  • 图片参数集 (PPS).与 SPS 类似,此非 VCL 包含有关熵编码模式、切片组、运动预测和去块滤波器的信息.
  • 瞬时解码器刷新 (IDR).这个 VCL NALU 是一个独立的图像切片.也就是说,可以在不参考任何其他 NALU 保存 SPS 和 PPS 的情况下解码和显示 IDR.
  • 访问单元分隔符 (AUD). AUD 是一个可选的 NALU,可用于分隔基本流中的帧.它不是必需的(除非容器/协议另有说明,如 TS),并且通常不包括在内以节省空间,但在不必完全解析每个 NALU 的情况下找到帧的开头可能很有用.
  • Sequence Parameter Set (SPS). This non-VCL NALU contains information required to configure the decoder such as profile, level, resolution, frame rate.
  • Picture Parameter Set (PPS). Similar to the SPS, this non-VCL contains information on entropy coding mode, slice groups, motion prediction and deblocking filters.
  • Instantaneous Decoder Refresh (IDR). This VCL NALU is a self contained image slice. That is, an IDR can be decoded and displayed without referencing any other NALU save SPS and PPS.
  • Access Unit Delimiter (AUD). An AUD is an optional NALU that can be use to delimit frames in an elementary stream. It is not required (unless otherwise stated by the container/protocol, like TS), and is often not included in order to save space, but it can be useful to finds the start of a frame without having to fully parse each NALU.

NALU 不包含它的大小.因此,简单地连接 NALU 以创建流是行不通的,因为您不知道一个在哪里停止,下一个从哪里开始.

A NALU does not contain is its size. Therefore simply concatenating the NALUs to create a stream will not work because you will not know where one stops and the next begins.

附件 B 规范通过要求在每个 NALU 之前有起始代码"来解决这个问题.起始码是 2 或 3 个 0x00 字节后跟一个 0x01 字节.例如0x0000010x00000001.

The Annex B specification solves this by requiring ‘Start Codes’ to precede each NALU. A start code is 2 or 3 0x00 bytes followed with a 0x01 byte. e.g. 0x000001 or 0x00000001.

4 字节变体对于通过串行连接进行传输很有用,因为通过查找 31 个零位后跟一个 1 来字节对齐流是微不足道的.如果下一位是 0(因为每个 NALU 都是以 0 位开始的),那么它就是一个 NALU 的开始.4 字节变量通常仅用于信号流中的随机接入点,例如 SPS PPS AUD 和 IDR,而 3 字节变量用于其他任何地方以节省空间.

The 4 byte variation is useful for transmission over a serial connection as it is trivial to byte align the stream by looking for 31 zero bits followed by a one. If the next bit is 0 (because every NALU starts with a 0 bit), it is the start of a NALU. The 4 byte variation is usually only used for signaling random access points in the stream such as a SPS PPS AUD and IDR Where as the 3 byte variation is used everywhere else to save space.

起始码有效是因为四个字节序列0x0000000x0000010x0000020x000003在一个非 RBSP NALU.因此,在创建 NALU 时,请注意转义这些可能与起始代码混淆的值.这是通过插入一个仿真预防"字节0x03来实现的,这样0x000001就变成了0x00000301.

Start codes work because the four byte sequences 0x000000, 0x000001, 0x000002 and 0x000003 are illegal within a non-RBSP NALU. So when creating a NALU, care is taken to escape these values that could otherwise be confused with a start code. This is accomplished by inserting an ‘Emulation Prevention’ byte 0x03, so that 0x000001 becomes 0x00000301.

解码时,重要的是寻找并忽略仿真预防字节.因为模拟预防字节几乎可以出现在 NALU 中的任何地方,所以在文档中假设它们已经被删除通常更方便.没有仿真预防字节的表示称为原始字节序列有效载荷 (RBSP).

When decoding, it is important to look for and ignore emulation prevention bytes. Because emulation prevention bytes can occur almost anywhere within a NALU, it is often more convenient in documentation to assume they have already been removed. A representation without emulation prevention bytes is called Raw Byte Sequence Payload (RBSP).

让我们看一个完整的例子.

Let's look at a complete example.

0x0000 | 00 00 00 01 67 64 00 0A AC 72 84 44 26 84 00 00
0x0010 | 03 00 04 00 00 03 00 CA 3C 48 96 11 80 00 00 00
0x0020 | 01 68 E8 43 8F 13 21 30 00 00 01 65 88 81 00 05
0x0030 | 4E 7F 87 DF 61 A5 8B 95 EE A4 E9 38 B7 6A 30 6A
0x0040 | 71 B9 55 60 0B 76 2E B5 0E E4 80 59 27 B8 67 A9
0x0050 | 63 37 5E 82 20 55 FB E4 6A E9 37 35 72 E2 22 91
0x0060 | 9E 4D FF 60 86 CE 7E 42 B7 95 CE 2A E1 26 BE 87
0x0070 | 73 84 26 BA 16 36 F4 E6 9F 17 DA D8 64 75 54 B1
0x0080 | F3 45 0C 0B 3C 74 B3 9D BC EB 53 73 87 C3 0E 62
0x0090 | 47 48 62 CA 59 EB 86 3F 3A FA 86 B5 BF A8 6D 06
0x00A0 | 16 50 82 C4 CE 62 9E 4E E6 4C C7 30 3E DE A1 0B
0x00B0 | D8 83 0B B6 B8 28 BC A9 EB 77 43 FC 7A 17 94 85
0x00C0 | 21 CA 37 6B 30 95 B5 46 77 30 60 B7 12 D6 8C C5
0x00D0 | 54 85 29 D8 69 A9 6F 12 4E 71 DF E3 E2 B1 6B 6B
0x00E0 | BF 9F FB 2E 57 30 A9 69 76 C4 46 A2 DF FA 91 D9
0x00F0 | 50 74 55 1D 49 04 5A 1C D6 86 68 7C B6 61 48 6C
0x0100 | 96 E6 12 4C 27 AD BA C7 51 99 8E D0 F0 ED 8E F6
0x0110 | 65 79 79 A6 12 A1 95 DB C8 AE E3 B6 35 E6 8D BC
0x0120 | 48 A3 7F AF 4A 28 8A 53 E2 7E 68 08 9F 67 77 98
0x0130 | 52 DB 50 84 D6 5E 25 E1 4A 99 58 34 C7 11 D6 43
0x0140 | FF C4 FD 9A 44 16 D1 B2 FB 02 DB A1 89 69 34 C2
0x0150 | 32 55 98 F9 9B B2 31 3F 49 59 0C 06 8C DB A5 B2
0x0160 | 9D 7E 12 2F D0 87 94 44 E4 0A 76 EF 99 2D 91 18
0x0170 | 39 50 3B 29 3B F5 2C 97 73 48 91 83 B0 A6 F3 4B
0x0180 | 70 2F 1C 8F 3B 78 23 C6 AA 86 46 43 1D D7 2A 23
0x0190 | 5E 2C D9 48 0A F5 F5 2C D1 FB 3F F0 4B 78 37 E9
0x01A0 | 45 DD 72 CF 80 35 C3 95 07 F3 D9 06 E5 4A 58 76
0x01B0 | 03 6C 81 20 62 45 65 44 73 BC FE C1 9F 31 E5 DB
0x01C0 | 89 5C 6B 79 D8 68 90 D7 26 A8 A1 88 86 81 DC 9A
0x01D0 | 4F 40 A5 23 C7 DE BE 6F 76 AB 79 16 51 21 67 83
0x01E0 | 2E F3 D6 27 1A 42 C2 94 D1 5D 6C DB 4A 7A E2 CB
0x01F0 | 0B B0 68 0B BE 19 59 00 50 FC C0 BD 9D F5 F5 F8
0x0200 | A8 17 19 D6 B3 E9 74 BA 50 E5 2C 45 7B F9 93 EA
0x0210 | 5A F9 A9 30 B1 6F 5B 36 24 1E 8D 55 57 F4 CC 67
0x0220 | B2 65 6A A9 36 26 D0 06 B8 E2 E3 73 8B D1 C0 1C
0x0230 | 52 15 CA B5 AC 60 3E 36 42 F1 2C BD 99 77 AB A8
0x0240 | A9 A4 8E 9C 8B 84 DE 73 F0 91 29 97 AE DB AF D6
0x0250 | F8 5E 9B 86 B3 B3 03 B3 AC 75 6F A6 11 69 2F 3D
0x0260 | 3A CE FA 53 86 60 95 6C BB C5 4E F3

这是一个包含 3 个 NALU 的完整 AU.如您所见,我们以开始代码开始,然后是 SPS(SPS 以 67 开头).在 SPS 中,您将看到两个 Emulation Prevention 字节.如果没有这些字节,非法序列 0x000000 将出现在这些位置.接下来,您将看到一个起始代码后跟一个 PPS(PPS 以 68 开头)和一个最终起始代码后跟一个 IDR 切片.这是一个完整的 H.264 流.如果您在十六进制编辑器中输入这些值并使用 .264 扩展名保存文件,您将能够将其转换为该图像:

This is a complete AU containing 3 NALUs. As you can see, we begin with a Start code followed by an SPS (SPS starts with 67). Within the SPS, you will see two Emulation Prevention bytes. Without these bytes the illegal sequence 0x000000 would occur at these positions. Next you will see a start code followed by a PPS (PPS starts with 68) and one final start code followed by an IDR slice. This is a complete H.264 stream. If you type these values into a hex editor and save the file with a .264 extension, you will be able to convert it to this image:

附件 B 常用于直播和流媒体格式,例如传输流、无线广播和 DVD.在这些格式中,通常周期性地重复 SPS 和 PPS,通常在每个 IDR 之前,从而为解码器创建一个随机访问点.这使得能够加入已经在进行中的流.

Annex B is commonly used in live and streaming formats such as transport streams, over the air broadcasts, and DVDs. In these formats it is common to repeat the SPS and PPS periodically, usually preceding every IDR thus creating a random access point for the decoder. This enables the ability to join a stream already in progress.

另一种存储 H.264 流的常用方法是 AVCC 格式.在这种格式中,每个 NALU 前面都有它的长度(大端格式).这种方法更容易解析,但是你失去了 Annex B 的字节对齐特性.为了使事情复杂化,长度可能使用 1、2 或 4 个字节进行编码.该值存储在标头对象中.此标头通常称为额外数据"或序列标头".其基本格式如下:

The other common method of storing an H.264 stream is the AVCC format. In this format, each NALU is preceded with its length (in big endian format). This method is easier to parse, but you lose the byte alignment features of Annex B. Just to complicate things, the length may be encoded using 1, 2 or 4 bytes. This value is stored in a header object. This header is often called ‘extradata’ or ‘sequence header’. Its basic format is as follows:

bits    
8   version ( always 0x01 )
8   avc profile ( sps[0][1] )
8   avc compatibility ( sps[0][2] )
8   avc level ( sps[0][3] )
6   reserved ( all bits on )
2   NALULengthSizeMinusOne
3   reserved ( all bits on )
5   number of SPS NALUs (usually 1)

repeated once per SPS:
  16         SPS size
  variable   SPS NALU data

8   number of PPS NALUs (usually 1)

repeated once per PPS:
  16       PPS size
  variable PPS NALU data

使用上面相同的示例,AVCC 额外数据将如下所示:

Using the same example above, the AVCC extradata will look like this:

0x0000 | 01 64 00 0A FF E1 00 19 67 64 00 0A AC 72 84 44
0x0010 | 26 84 00 00 03 00 04 00 00 03 00 CA 3C 48 96 11
0x0020 | 80 01 00 07 68 E8 43 8F 13 21 30

您会注意到 SPS 和 PPS 现在存储在带外.即,与基本流数据分开.这些数据的存储和传输是文件容器的工作,超出了本文档的范围.请注意,即使我们没有使用起始代码,仍会插入仿真预防字节.

You will notice SPS and PPS is now stored out of band. That is, separate from the elementary stream data. Storage and transmission of this data is the job of the file container, and beyond the scope of this document. Notice that even though we are not using start codes, emulation prevention bytes are still inserted.

此外,还有一个名为 NALULengthSizeMinusOne 的新变量.这个名称令人困惑的变量告诉我们使用多少字节来存储每个 NALU 的长度.因此,如果 NALULengthSizeMinusOne 设置为 0,则每个 NALU 前面都有一个指示其长度的字节.使用单个字节存储大小,NALU 的最大大小为 255 个字节.那显然很小.对于整个关键帧来说太小了.使用 2 个字节为每个 NALU 提供 64k.它可以在我们的示例中工作,但仍然是一个相当低的限制.3 个字节将是完美的,但由于某种原因并没有得到普遍支持.因此,到目前为止,4 个字节是最常见的,也是我们在这里使用的:

Additionally, there is a new variable called NALULengthSizeMinusOne. This confusingly named variable tells us how many bytes to use to store the length of each NALU. So, if NALULengthSizeMinusOne is set to 0, then each NALU is preceded with a single byte indicating its length. Using a single byte to store the size, the max size of a NALU is 255 bytes. That is obviously pretty small. Way too small for an entire key frame. Using 2 bytes gives us 64k per NALU. It would work in our example, but is still a pretty low limit. 3 bytes would be perfect, but for some reason is not universally supported. Therefore, 4 bytes is by far the most common, and it is what we used here:

0x0000 | 00 00 02 41 65 88 81 00 05 4E 7F 87 DF 61 A5 8B
0x0010 | 95 EE A4 E9 38 B7 6A 30 6A 71 B9 55 60 0B 76 2E
0x0020 | B5 0E E4 80 59 27 B8 67 A9 63 37 5E 82 20 55 FB
0x0030 | E4 6A E9 37 35 72 E2 22 91 9E 4D FF 60 86 CE 7E
0x0040 | 42 B7 95 CE 2A E1 26 BE 87 73 84 26 BA 16 36 F4
0x0050 | E6 9F 17 DA D8 64 75 54 B1 F3 45 0C 0B 3C 74 B3
0x0060 | 9D BC EB 53 73 87 C3 0E 62 47 48 62 CA 59 EB 86
0x0070 | 3F 3A FA 86 B5 BF A8 6D 06 16 50 82 C4 CE 62 9E
0x0080 | 4E E6 4C C7 30 3E DE A1 0B D8 83 0B B6 B8 28 BC
0x0090 | A9 EB 77 43 FC 7A 17 94 85 21 CA 37 6B 30 95 B5
0x00A0 | 46 77 30 60 B7 12 D6 8C C5 54 85 29 D8 69 A9 6F
0x00B0 | 12 4E 71 DF E3 E2 B1 6B 6B BF 9F FB 2E 57 30 A9
0x00C0 | 69 76 C4 46 A2 DF FA 91 D9 50 74 55 1D 49 04 5A
0x00D0 | 1C D6 86 68 7C B6 61 48 6C 96 E6 12 4C 27 AD BA
0x00E0 | C7 51 99 8E D0 F0 ED 8E F6 65 79 79 A6 12 A1 95
0x00F0 | DB C8 AE E3 B6 35 E6 8D BC 48 A3 7F AF 4A 28 8A
0x0100 | 53 E2 7E 68 08 9F 67 77 98 52 DB 50 84 D6 5E 25
0x0110 | E1 4A 99 58 34 C7 11 D6 43 FF C4 FD 9A 44 16 D1
0x0120 | B2 FB 02 DB A1 89 69 34 C2 32 55 98 F9 9B B2 31
0x0130 | 3F 49 59 0C 06 8C DB A5 B2 9D 7E 12 2F D0 87 94
0x0140 | 44 E4 0A 76 EF 99 2D 91 18 39 50 3B 29 3B F5 2C
0x0150 | 97 73 48 91 83 B0 A6 F3 4B 70 2F 1C 8F 3B 78 23
0x0160 | C6 AA 86 46 43 1D D7 2A 23 5E 2C D9 48 0A F5 F5
0x0170 | 2C D1 FB 3F F0 4B 78 37 E9 45 DD 72 CF 80 35 C3
0x0180 | 95 07 F3 D9 06 E5 4A 58 76 03 6C 81 20 62 45 65
0x0190 | 44 73 BC FE C1 9F 31 E5 DB 89 5C 6B 79 D8 68 90
0x01A0 | D7 26 A8 A1 88 86 81 DC 9A 4F 40 A5 23 C7 DE BE
0x01B0 | 6F 76 AB 79 16 51 21 67 83 2E F3 D6 27 1A 42 C2
0x01C0 | 94 D1 5D 6C DB 4A 7A E2 CB 0B B0 68 0B BE 19 59
0x01D0 | 00 50 FC C0 BD 9D F5 F5 F8 A8 17 19 D6 B3 E9 74
0x01E0 | BA 50 E5 2C 45 7B F9 93 EA 5A F9 A9 30 B1 6F 5B
0x01F0 | 36 24 1E 8D 55 57 F4 CC 67 B2 65 6A A9 36 26 D0
0x0200 | 06 B8 E2 E3 73 8B D1 C0 1C 52 15 CA B5 AC 60 3E
0x0210 | 36 42 F1 2C BD 99 77 AB A8 A9 A4 8E 9C 8B 84 DE
0x0220 | 73 F0 91 29 97 AE DB AF D6 F8 5E 9B 86 B3 B3 03
0x0230 | B3 AC 75 6F A6 11 69 2F 3D 3A CE FA 53 86 60 95
0x0240 | 6C BB C5 4E F3

这种格式的一个优点是能够在开始时配置解码器并跳转到流的中间.这是一个常见用例,其中媒体可在随机访问媒体(如硬盘)上使用,因此用于 MP4 和 MKV 等常见容器格式.

An advantage to this format is the ability to configure the decoder at the start and jump into the middle of a stream. This is a common use case where the media is available on a random access medium such as a hard drive, and is therefore used in common container formats such as MP4 and MKV.

这篇关于H.264 流的序列/图片参数集的可能位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆