带有 BT.709 矩阵的 H.264 编码视频是否包括任何伽马调整? [英] Does H.264 encoded video with BT.709 matrix include any gamma adjustment?

查看:25
本文介绍了带有 BT.709 矩阵的 H.264 编码视频是否包括任何伽马调整?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已阅读

第二张图像已转换为线性 RGB 色彩空间,并使用线性 RGB 配置文件进行标记.

这第三张图片已通过来自

请注意右下角的 sRGB 值 (0x555555) 如何变为线性 RGB (0x171717),而 BT.709 伽马编码值变为 (0x464646).不清楚的是我是否应该将线性 RGB 值传递到 ffmpeg 中,或者我是否应该传递一个已经 BT.709 伽马编码的值,然后需要在线性转换矩阵步骤之前在客户端解码以返回到 RGB.

更新:

根据反馈,我更新了基于 C 的实现和 Metal 着色器,并作为 iOS 示例项目上传到 github MetalBT709解码器.

编码归一化的线性 RGB 值是这样实现的:

静态内联int BT709_convertLinearRGBToYCbCr(浮动 Rn,浮动 Gn,浮动 Bn,int *YPtr,int *CbPtr,int *CrPtr,int applyGammaMap){//Gamma 调整为非线性值如果(应用GammaMap){Rn = BT709_linearNormToNonLinear(Rn);Gn = BT709_linearNormToNonLinear(Gn);Bn = BT709_linearNormToNonLinear(Bn);}//https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.709-6-201506-I!!PDF-E.pdf浮动 Ey = (Kr * Rn) + (Kg * Gn) + (Kb * Bn);浮动 Eb = (Bn - Ey)/Eb_minus_Ey_Range;浮动 Er = (Rn - Ey)/Er_minus_Ey_Range;//Quant Y 到范围 [16, 235](包括 219 个值)//Quant Eb, Er 到范围 [16, 240](包括 224 个值,以 128 为中心)浮动 AdjEy = (Ey * (YMax-YMin)) + 16;浮动 AdjEb = (Eb * (UVMax-UVMin)) + 128;浮动 AdjEr = (Er * (UVMax-UVMin)) + 128;*YPtr = (int) round(AdjEy);*CbPtr = (int) round(AdjEb);*CrPtr = (int) round(AdjEr);返回0;}

从 YCbCr 到线性 RGB 的解码实现如下:

静态内联int BT709_convertYCbCrToLinearRGB(输入 Y,国际商业银行,内部铬,浮动*RPtr,浮动 *GPtr,浮动 *BPtr,int applyGammaMap){//https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.709_conversion//http://www.niwa.nu/2013/05/understanding-yuv-values///将 Y 归一化到范围 [0 255]////注意矩阵乘法会调整//这个字节归一化的范围要考虑//有限范围 [16 235]浮动 Yn = (Y - 16) * (1.0f/255.0f);//在 128 和范围 [0 255] 处将 Cb 和 CR 归一化为零//注意矩阵会调整到有限的范围 [16 240]浮动 Cbn = (Cb - 128) * (1.0f/255.0f);浮动 Crn = (Cr - 128) * (1.0f/255.0f);const float YScale = 255.0f/(YMax-YMin);const float UVScale = 255.0f/(UVMax-UVMin);常量浮动 BT709Mat[] = {YScale, 0.000f, (UVScale * Er_minus_Ey_Range),YScale, (-1.0f * UVScale * Eb_minus_Ey_Range * Kb_over_Kg), (-1.0f * UVScale * Er_minus_Ey_Range * Kr_over_Kg),YScale, (UVScale * Eb_minus_Ey_Range), 0.000f,};//矩阵乘法运算////RGB = BT709Mat * YCbCr//将输入 Y、Cb、Cr 转换为标准化浮点值浮动 Rn = (Yn * BT709Mat[0]) + (Cbn * BT709Mat[1]) + (Crn * BT709Mat[2]);浮动 Gn = (Yn * BT709Mat[3]) + (Cbn * BT709Mat[4]) + (Crn * BT709Mat[5]);浮动 Bn = (Yn * BT709Mat[6]) + (Cbn * BT709Mat[7]) + (Crn * BT709Mat[8]);//使归一化线性 (R G B) 饱和到范围 [0.0, 1.0]Rn = saturatef(Rn);Gn = saturatef(Gn);Bn = saturatef(Bn);//矩阵变换后RGB分量的Gamma调整如果(应用GammaMap){Rn = BT709_nonLinearNormToLinear(Rn);Gn = BT709_nonLinearNormToLinear(Gn);Bn = BT709_nonLinearNormToLinear(Bn);}*RPtr = Rn;*GPtr = Gn;*BPtr = Bn;返回0;}

我相信此逻辑已正确实施,但我很难验证结果.当我生成一个包含伽马调整颜色值 (osxcolor_test_image_24bit_BT709.m4v) 的 .m4v 文件时,结果会按预期出现.但是我在这里 似乎不起作用,因为颜色条值似乎被编码为线性(无伽马调整).

对于 SMPTE 测试图案,0.75 灰度级是线性 RGB (191 191 191),如果此 RGB 没有伽马调整编码为 (Y Cb Cr) (180 128 128) 或比特流中的值应显示为伽马调整(Y Cb Cr)(206 128 128)?

(跟进)在对这个伽马问题进行进一步研究后,很明显苹果在 AVFoundation 中实际使用的是 1.961 伽马函数.使用 AVAssetWriterInputPixelBufferAdaptor、使用 vImage 或 CoreVideo API 进行编码时就是这种情况.这个分段伽马函数定义如下:

#define APPLE_GAMMA_196 (1.960938f)静态内联浮动 Apple196_nonLinearNormToLinear(float normV) {const float xIntercept = 0.05583828f;如果 (normV < xIntercept) {normV *= (1.0f/16.0f);} 别的 {const float gamma = APPLE_GAMMA_196;normV = pow(normV, gamma);}返回范数V;}静态内联浮动 Apple196_linearNormToNonLinear(float normV) {const float yIntercept = 0.00349f;如果 (normV < yIntercept) {范数 *= 16.0f;} 别的 {const float gamma = 1.0f/APPLE_GAMMA_196;normV = pow(normV, gamma);}返回范数V;}

解决方案

您的原始问题:使用 BT.709 矩阵的 H.264 编码视频是否包括任何伽马调整?

编码视频仅包含伽马调整 - 如果您提供编码器伽马调整值.

H.264 编码器不关心传输特性.所以如果你压缩线性然后解压缩 - 你会得到线性.所以如果你用伽马压缩然后解压 - 你会得到伽马.

或者如果你的比特是用 Rec.709传递函数-编码器不会改变伽马.

但您可以将 H.264 流中的传输特性指定为元数据.(ITU-T H.264 (04/2017) E.1.1 VUI 参数语法建议书).所以编码后的流携带了色彩空间信息,但不用于编码或解码.

我假设 8 位视频总是包含非线性传递函数.否则你会相当不明智地使用 8 位.

如果您转换为线性来制作效果和合成 - 我建议增加位深度或线性化为浮点数.

色彩空间由原色、传递函数和矩阵系数组成.伽马调整在传递函数中编码(而不是在矩阵中).

I have read the BT.709 spec a number of times and the thing that is just not clear is should an encoded H.264 bitstream actually apply any gamma curve to the encoded data? Note the specific mention of a gamma like formula in the BT.709 spec. Apple provided examples of OpenGL or Metal shaders that read YUV data from CoreVideo provided buffers do not do any sort of gamma adjustment. YUV values are being read and processed as though they are simple linear values. I also examined the source code of ffmpeg and found no gamma adjustments being applied after the BT.709 scaling step. I then created a test video with just two linear grayscale colors 5 and 26 corresponding to 2% and 10% levels. When converted to H.264 with both ffmpeg and iMovie, the output BT.709 values are (YCbCr) (20 128 128) and (38 128 128) and these values exactly match the output of the BT.709 conversion matrix without any gamma adjustment.

A great piece of background on this topic can be found at Quicktime Gamma Bug. It seems that some historical issues with Quicktime and Adobe encoders were improperly doing different gamma adjustments and the results made video streams look awful on different players. This is really confusing because if you compare to sRGB, it clearly indicates how to apply a gamma encoding and then decode it to convert between sRGB and linear. Why does BT.709 go into so much detail about the same sort of gamma adjustment curve if no gamma adjustment is applied after the matrix step when creating a h.264 data stream? Are all the color steps in a h.264 stream meant to be coded as straight linear (gamma 1.0) values?

In case specific example input would make things more clear, I am attaching 3 color bar images, the exact values of different colors can be displayed in an image editor with these image files.

This first image is in the sRGB colorspace and is tagged as sRGB.

This second image has been converted to the linear RGB colorspace and is tagged with a linear RGB profile.

This third image has been converted to REC.709 profile levels with Rec709-elle-V4-rec709.icc from elles_icc_profiles . This seems to be what one would need to do to simulate "camera" gamma as described in BT.709.

Note how the sRGB value in the lower right corner (0x555555) becomes linear RGB (0x171717) and the BT.709 gamma encoded value becomes (0x464646). What is unclear is if I should be passing a linear RGB value into ffmpeg or if I should be passing an already BT.709 gamma encoded value which would then need to be decoded in the client before the linear conversion Matrix step to get back to RGB.

Update:

Based on the feedback, I have updated my C based implementation and Metal shader and uploaded to github as an iOS example project MetalBT709Decoder.

Encoding a normalized linear RGB value is implemented like this:

static inline
int BT709_convertLinearRGBToYCbCr(
                            float Rn,
                            float Gn,
                            float Bn,
                            int *YPtr,
                            int *CbPtr,
                            int *CrPtr,
                            int applyGammaMap)
{
  // Gamma adjustment to non-linear value

  if (applyGammaMap) {
    Rn = BT709_linearNormToNonLinear(Rn);
    Gn = BT709_linearNormToNonLinear(Gn);
    Bn = BT709_linearNormToNonLinear(Bn);
  }

  // https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.709-6-201506-I!!PDF-E.pdf

  float Ey = (Kr * Rn) + (Kg * Gn) + (Kb * Bn);
  float Eb = (Bn - Ey) / Eb_minus_Ey_Range;
  float Er = (Rn - Ey) / Er_minus_Ey_Range;

  // Quant Y to range [16, 235] (inclusive 219 values)
  // Quant Eb, Er to range [16, 240] (inclusive 224 values, centered at 128)

  float AdjEy = (Ey * (YMax-YMin)) + 16;
  float AdjEb = (Eb * (UVMax-UVMin)) + 128;
  float AdjEr = (Er * (UVMax-UVMin)) + 128;

  *YPtr = (int) round(AdjEy);
  *CbPtr = (int) round(AdjEb);
  *CrPtr = (int) round(AdjEr);

  return 0;
}

Decoding from YCbCr to linear RGB is implemented like so:

static inline
int BT709_convertYCbCrToLinearRGB(
                             int Y,
                             int Cb,
                             int Cr,
                             float *RPtr,
                             float *GPtr,
                             float *BPtr,
                             int applyGammaMap)
{
  // https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.709_conversion
  // http://www.niwa.nu/2013/05/understanding-yuv-values/

  // Normalize Y to range [0 255]
  //
  // Note that the matrix multiply will adjust
  // this byte normalized range to account for
  // the limited range [16 235]

  float Yn = (Y - 16) * (1.0f / 255.0f);

  // Normalize Cb and CR with zero at 128 and range [0 255]
  // Note that matrix will adjust to limited range [16 240]

  float Cbn = (Cb - 128) * (1.0f / 255.0f);
  float Crn = (Cr - 128) * (1.0f / 255.0f);

  const float YScale = 255.0f / (YMax-YMin);
  const float UVScale = 255.0f / (UVMax-UVMin);

  const
  float BT709Mat[] = {
    YScale,   0.000f,  (UVScale * Er_minus_Ey_Range),
    YScale, (-1.0f * UVScale * Eb_minus_Ey_Range * Kb_over_Kg),  (-1.0f * UVScale * Er_minus_Ey_Range * Kr_over_Kg),
    YScale, (UVScale * Eb_minus_Ey_Range),  0.000f,
  };

  // Matrix multiply operation
  //
  // rgb = BT709Mat * YCbCr

  // Convert input Y, Cb, Cr to normalized float values

  float Rn = (Yn * BT709Mat[0]) + (Cbn * BT709Mat[1]) + (Crn * BT709Mat[2]);
  float Gn = (Yn * BT709Mat[3]) + (Cbn * BT709Mat[4]) + (Crn * BT709Mat[5]);
  float Bn = (Yn * BT709Mat[6]) + (Cbn * BT709Mat[7]) + (Crn * BT709Mat[8]);

  // Saturate normalzied linear (R G B) to range [0.0, 1.0]

  Rn = saturatef(Rn);
  Gn = saturatef(Gn);
  Bn = saturatef(Bn);

  // Gamma adjustment for RGB components after matrix transform

  if (applyGammaMap) {
    Rn = BT709_nonLinearNormToLinear(Rn);
    Gn = BT709_nonLinearNormToLinear(Gn);
    Bn = BT709_nonLinearNormToLinear(Bn);
  }

  *RPtr = Rn;
  *GPtr = Gn;
  *BPtr = Bn;

  return 0;
}

I believe this logic is implemented correctly, but I am having a very difficult time validating the results. When I generate a .m4v file that contains gamma adjusted color values (osxcolor_test_image_24bit_BT709.m4v), the result come out as expected. But a test case like (bars_709_Frame01.m4v) that I found here does not seem to work as the color bar values seem to be encoded as linear (no gamma adjustment).

For a SMPTE test pattern, the 0.75 graylevel is linear RGB (191 191 191), should this RGB be encoded with no gamma adjustment as (Y Cb Cr) (180 128 128) or should the value in the bitstream appear as the gamma adjusted (Y Cb Cr) (206 128 128)?

(follow up) After doing additional research into this gamma issue, it has become clear that what Apple is actually doing in AVFoundation is using a 1.961 gamma function. This is the case when encoding with AVAssetWriterInputPixelBufferAdaptor, when using vImage, or with CoreVideo APIs. This piecewise gamma function is defined as follows:

#define APPLE_GAMMA_196 (1.960938f)

static inline
float Apple196_nonLinearNormToLinear(float normV) {
  const float xIntercept = 0.05583828f;

  if (normV < xIntercept) {
    normV *= (1.0f / 16.0f);
  } else {
    const float gamma = APPLE_GAMMA_196;
    normV = pow(normV, gamma);
  }

  return normV;
}

static inline
float Apple196_linearNormToNonLinear(float normV) {
  const float yIntercept = 0.00349f;

  if (normV < yIntercept) {
    normV *= 16.0f;
  } else {
    const float gamma = 1.0f / APPLE_GAMMA_196;
    normV = pow(normV, gamma);
  }

  return normV;
}

解决方案

Your original question: Does H.264 encoded video with BT.709 matrix include any gamma adjustment?

The encoded video only contains gamma adjustment - if you feed the encoder gamma adjusted values.

A H.264 encoder doesn't care about the transfer characteristics. So if you compress linear and then decompress - you'll get linear. So if you compress with gamma and then decompress - you'll get gamma.

Or if your bits are encoded with a Rec. 709 transfer function - the encoder won't change the gamma.

But you can specify the transfer characteristic in the H.264 stream as metadata. (Rec. ITU-T H.264 (04/2017) E.1.1 VUI parameters syntax). So the encoded streams carries the color space information around but it is not used in encoding or decoding.

I would assume that 8 bit video always contains a non linear transfer function. Otherwise you would use the 8 bit fairly unwisely.

If you convert to linear to do effects and composition - I'd recommend increasing the bit depth or linearizing into floats.

A color space consists of primaries, transfer function and matrix coefficients. The gamma adjustment is encoded in the transfer function (and not in the matrix).

这篇关于带有 BT.709 矩阵的 H.264 编码视频是否包括任何伽马调整?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆