假设WAV或AIFF文件中的浮点样本将被正常化是否正确? [英] Is it correct to assume that floating-point samples in a WAV or AIFF file will be normalized?

查看:416
本文介绍了假设WAV或AIFF文件中的浮点样本将被正常化是否正确?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个读取.WAV或.AIFF文件的程序,并且该文件的音频被编码为浮点采样值。我的程序是否正确假设任何格式良好的(基于浮点的).WAV或.AIFF文件都只包含范围为[-1.0f,+ 1.0f]的样本值?我无法在WAV或AIFF规范中找到解决这一问题的任何内容。



如果这不是一个有效的假设,那么如何才能知道完整的动态范围在文件中的音频的目的是为了? (我可以读取整个文件,找出文件实际的最小和最大采样值,但是有两个问题:(1)如果文件非常大,这将是一个慢/昂贵的操作;(2) )它会丢失信息,因为如果文件的创建者有意让文件有一些空间,以免在最大声点的dbFS上播放,我的程序将无法检测到)$ b $正如你所说的,公开可用的文档不会详细介绍用于浮点的范围。但是,从过去几年的行业实践来看,从实际的数据存在为浮点文件,我认为这是一个有效的假设。

有实际的原因,以及一个非常常见的高精度数据正常化的范围是颜色,音频,3D等。

范围的主要原因是在区间[-1,1]是快速和容易地缩放/转换到目标比特范围。您只需要提供目标范围和乘数。



例如:

如果你想在16位播放它(伪,假设签名四舍五入到整数结果):

  sample = in< 0? * 0x8000:* 0x7fff; 

或24位:

  sample = in< 0? * 0x800000:* 0x7fffff; 

或8位:

  sample = in< 0?在* 0x80:在* 0x7f; 

等。而无需以任何方式调整原始输入值。 -1和1表示转换为目标时的最小/最大值(1x = x)。如果您使用的范围是[-0.5,0.5] (或在某一点)不得不调整输入值,所以转换为例如16位将需要额外的步骤 - 这有额外的成本,不仅是额外的步骤,而且我们将在浮点域工作更重要的是计算(后者可能有点遗留的原因,因为浮点处理现在是相当快的,但在任何情况下)。

  in = in * 2; 
sample = in< 0? * 0x8000:* 0x7fff;

保留在[-1,1]范围内,而不是一些预先缩放的范围(例如[-32768,32767])也允许使用更多的位来精确(使用IEEE 754表示法)。



UPDATE 2017/07

测试



根据评论中的问题,我决定用三个文件进行三重检查1秒正弦波:

A)浮动点被削减

B)浮动点最大0dB和

C)整数修剪(从A转换而来)

数据开始之后扫描正值<-1.0和> = 1.0的文件, / code>块和大小字段,以使最小值/最大值反映在音频数据中找到的实际值。



结果证实范围确实在包括[-1,1]的范围,当 not clipping (非真<= 0 dB)时。但是它也揭示了另一个方面 -

WAV文件保存为浮点 >允许超过0 dB范围的值。这意味着范围实际上超出了[-1,1]的值,通常会剪辑。



对此的解释可能是浮点格式旨在用于中间使用由于动态范围的损失非常小,未来的处理(增益分级,压缩,限制等)可以在最终的和正常的-0.2-0dB范围内很好地恢复(没有损失的)值;因此保留了原来的值。

结论



使用浮点数的WAV文件将会保存值在不剪切(<= 0dB)的情况下在[-1,1]中,但确实允许被剪切的值整数格式,这些值将被剪切为与整数格式的位范围成比例的等同的[-1,1]范围,无论如何。这是很自然的,因为每个宽度都可以容纳的范围是有限的。

这样可以让玩家/ DAW /编辑软件处理修剪的浮点值,数据或简单地回到[-1,1]。



注意:所有文件的最大值都是直接从样本数据中计算出来的。

b $ b



注:生产为(+6 dB),然后转换为带符号的16位,并返回浮动状态




备注:削减至+6 dB





<简单的测试脚本和文件可以找到注:削减到+12 dB .CO m / Sv90R6screl =nofollow noreferrer> here

Say I have a program that reads a .WAV or .AIFF file, and the file's audio is encoded as floating-point sample-values. Is it correct for my program to assume that any well-formed (floating-point-based) .WAV or .AIFF file will contain sample values only in the range [-1.0f,+1.0f]? I couldn't find anything in the WAV or AIFF specifications that addresses this point.

And if that is not a valid assumption, how can one know what the full dynamic range of the audio in the file was intended to be? (I could read the entire file and find out what the file's actual minimum and maximum sample values are, but there are two problems with that: (1) it would be a slow/expensive operation if the file is very large, and (2) it would lose information, in that if the file's creator had intended the file to have some "headroom" so as not play at dbFS at its loudest point, my program would not be able to detect that)

解决方案

As you state, the public available documentation do not go into details about the range used for floating point. However, from practice in the industry over the last several years, and from actual data existing as floating point files, I would say it is a valid assumption.

There are practical reasons to this as well as a very common range for normalization of high-precision data being color, audio, 3D etc.

The main reason for the range to be in the interval [-1, 1] is that it is fast and easy to scale/convert to the target bit-range. You only need to supply the target range and multiply.

For example:

If you want to play it at 16-bit you would do (pseudo, assuming signed rounded to integer result):

sample = in < 0 ? in * 0x8000 : in * 0x7fff;

or 24-bit:

sample = in < 0 ? in * 0x800000 : in * 0x7fffff;

or 8-bit:

sample = in < 0 ? in * 0x80 : in * 0x7f;

etc. without having to adjust the original input value in any way. -1 and 1 would represent min/max value when converted to target (1x = x).

If you used a range of [-0.5, 0.5] you would first (or at some point) have to adjust the input value so a conversion to for example 16-bit would need extra steps - this has an extra cost, not only for the extra step but also as we would work in the floating point domain which is heavier to compute (the latter is perhaps a bit legacy reason as floating point processing is pretty fast nowadays, but in any case).

in = in * 2;
sample = in < 0 ? in * 0x8000 : in * 0x7fff;

Keeping it in the [-1, 1] range rather than some pre-scaled ranged (for example [-32768, 32767]) also allow use of more bits for precision (using the IEEE 754 representation).

UPDATE 2017/07

Tests

Based on questions in comments I decided to triple-check by making a test using three files with a 1 second sine-wave:

A) Floating point clipped
B) Floating point max 0dB, and
C) integer clipped (converted from A)

The files where then scanned for positive values <= -1.0 and >= 1.0 starting after the data chunk and size field to make min/max values reflect the actual values found in the audio data.

The results confirms that the range is indeed in the [-1, 1] inclusive range, when not clipping (non-true <= 0 dB).

But it also revealed another aspect -

WAV files saved as floating point do allow values exceeding the 0 dB range. This means the range is actually beyond [-1, 1] for values that normally would clip.

The explanation for this can be that floating point formats are intended for intermediate use in production setups due to very little loss of dynamic range, where future processing (gain-staging, compressing, limiting etc.) can bring back the values (without loss) well within the final and normal -0.2 - 0 dB range; and therefor preserves the values as-is.

In conclusion

WAV files using floating point will save out values in the [-1, 1] when not clipping (<= 0dB), but does allow for values that are considered clipped

But when converted to a integer format these values will clip to the equivalent [-1, 1] range scaled by the bit-range of the integer format, regardless. This is natural due to the limited range each width can hold.

It will therefor be up the player/DAW/edit software to handle clipped floating point values by either normalizing the data or simply clip back to [-1, 1].


Notes: Max values for all files are measured directly from the sample data.


Notes: Produced as clipped float (+6 dB), then converted to signed 16-bit and back to float


Notes: Clipped to +6 dB


Notes: Clipped to +12 dB

Simple test script and files can be found here.

这篇关于假设WAV或AIFF文件中的浮点样本将被正常化是否正确?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆