将PCM音频从44100下采样到8000 [英] Downsample PCM audio from 44100 to 8000

查看:260
本文介绍了将PCM音频从44100下采样到8000的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在进行音频识别演示,并且api需要我传递采样率为 8000 16000 的.wav文件. ,因此我必须对其进行下采样.我尝试了以下2种算法.尽管它们都不能按我希望的方式解决问题,但结果仍有一些差异,我希望这一点可以使它更加清楚.

I've been working on a audio-recognize demo for some time, and the api needs me to pass an .wav file with sample rate of 8000 or 16000, so I have to downsample it. I have tried 2 algorithms as following. Though none of them solves the problem as I wish, there's some differences of the results and I hope that will make it more clear.

这是我的第一次尝试,当 sampleRate%outputSampleRate = 0 时工作正常,但是,当 outputSampleRate = 8000或1600 时,结果音频文件为无声(这意味着输出数组的每个元素的值为0):

This is my first try, and it works fine when sampleRate % outputSampleRate = 0, however when outputSampleRate = 8000 or 1600, the outcome audio file is silent(which means the value of every element of the output array is 0):

function interleave(inputL){
  var compression = sampleRate / outputSampleRate;
  var length = inputL.length / compression;
  var result = new Float32Array(length);

  var index = 0,
  inputIndex = 0;

  while (index < length){
    result[index++] = inputL[inputIndex];
    inputIndex += compression;
  }
  return result;
}

所以这是我的第二次尝试,它来自一家大型公司,但它也不起作用.而且,当我设置 sampleRate%outputSampleRate = 0 时,它仍然会输出 silent 文件:

So here's my second try which comes from a giant company, and it doesn't work too. What's more, when I set sampleRate % outputSampleRate = 0 it still output a silent file:

function interleave(e){
  var t = e.length;
  var n = new Float32Array(t),
    r = 0,
    i;
  for (i = 0; i < e.length; i++){
    n[r] = e[i];
    r += e[i].length;
  }
  sampleRate += 0.0;
  outputSampleRate += 0.0;
  var s = 0,
  o = sampleRate / outputSampleRate,
  u = Math.ceil(t * outputSampleRate / sampleRate),
  a = new Float32Array(u);
  for (i = 0; i < u; i++) {
    a[i] = n[Math.floor(s)];
    s += o;
  }

  return a
}

如果我的设置有误,请使用以下 encodeWAV 函数:

In case my settings were wrong, here's the encodeWAV function:

function encodeWAV(samples){
  var sampleBits = 16;
  var dataLength = samples.length*(sampleBits/8);

  var buffer = new ArrayBuffer(44 + dataLength);
  var view = new DataView(buffer);

  var offset = 0;

  /* RIFF identifier */
  writeString(view, offset, 'RIFF'); offset += 4;
  /* file length */
  view.setUint32(offset, 32 + dataLength, true); offset += 4;
  /* RIFF type */
  writeString(view, offset, 'WAVE'); offset += 4;
  /* format chunk identifier */
  writeString(view, offset, 'fmt '); offset += 4;
  /* format chunk length */
  view.setUint32(offset, 16, true); offset += 4;
  /* sample format (raw) */
  view.setUint16(offset, 1, true); offset += 2;
  /* channel count */
  view.setUint16(offset, outputChannels, true); offset += 2;
  /* sample rate */
  view.setUint32(offset, outputSampleRate, true); offset += 4;
  /* byte rate (sample rate * block align) */
  view.setUint32(offset, outputSampleRate*outputChannels*(sampleBits/8), true); offset += 4;
  /* block align (channel count * bytes per sample) */
  view.setUint16(offset, outputChannels*(sampleBits/8), true); offset += 2;
  /* bits per sample */
  view.setUint16(offset, sampleBits, true); offset += 2;
  /* data chunk identifier */
  writeString(view, offset, 'data'); offset += 4;
  /* data chunk length */
  view.setUint32(offset, dataLength, true); offset += 4;

  floatTo16BitPCM(view, offset, samples);

  return view;
}

它让我很困惑了很长时间,请让我知道我错过了什么...

It has confused me for a very long time, please let me know what I missed...

-----------------------------解决后--------------- -----------------

-----------------------------AFTER IT'S SOLVED--------------------------------

我很高兴它现在运行良好,这是函数 interleave()的正确版本:

I'm glad it's running well now and here's the right edition of function interleave():

    function interleave(e){
      var t = e.length;
      sampleRate += 0.0;
      outputSampleRate += 0.0;
      var s = 0,
      o = sampleRate / outputSampleRate,
      u = Math.ceil(t * outputSampleRate / sampleRate),
      a = new Float32Array(u);
      for (i = 0; i < u; i++) {
        a[i] = e[Math.floor(s)];
        s += o;
      }

      return a;
    }

所以您可以看到它是我传递给它的变量的类型不正确〜 再次感谢亲爱的@jaket和其他朋友〜尽管我以某种方式弄清楚了myslf,但他们让我更好地了解了原来的事情~~~:)

So you can see it's the variable that I passed to it was not of the proper type~ And thanks again for dear @jaket and other friends~ Though I figured it out myslf someway, they let me know the original things better~~~ :)

推荐答案

除了简单地丢弃样本或插入样本外,还有更多的采样率转换.

There is a lot more to sample rate conversion than just simply throwing samples away or inserting them.

让我们以一个简单的下采样2倍的情况为例(例如44100-> 22050).天真的方法是只丢弃所有其他样本.但可以想象一下,在原始的44.1kHz文件中,存在一个20kHz的正弦波.该采样率在nyquist(fs/2 = 22050)之内.丢掉所有其他样本后,它仍然会保持在10kHz,但现在它将高于nyquist(fs/2 = 11025),并且将混叠到您的输出信号中.最终结果是,您将有一个大的脂肪正弦波,频率为8975 Hz!

Lets take a simple case of downsampling by a factor of 2. (e.g. 44100->22050). A naive approach would be to just throw away every other sample. But imagine for a second that in the original 44.1kHz file there was a single sine wave present at 20khz. It is well within nyquist (fs/2=22050) for that sample rate. After you throw every other sample away it is still going to be there at 10kHz but now it will be above nyquist (fs/2=11025) and it will alias into your output signal. The final result is that you will have a big fat sine wave sitting at 8975 Hz!

为了避免在下采样期间出现这种混叠,您需要首先设计一个低通滤波器,并根据抽取率选择截止值.对于上面的示例,您将首先截断11025以上的所有内容,然后再进行抽取.

In order to avoid this aliasing during downsampling you need to first design a lowpass filter with a cutoff selected according to your decimation ratio. For the example above you would cutoff everything above 11025 first and then decimate.

硬币的反面称为上采样和内插.假设您想将采样率提高2倍.首先,在每个输入样本之间插入零,然后运行插值滤波器以计算值以使用周围的样本替换零.

The flip side of the coin is called upsampling and interpolation. Say you want to increase the sample rate by a factor of 2. First you insert zeros between every input sample and then run an interpolation filter to compute values to replace the zeros using the surrounding samples.

速率更改通常涉及抽取和插值的某种组合-因为这两种操作都需要整数个样本.以48000-> 32000为例.输出/输入比率是32000/48000或2/3.因此,您可以将28000上采样48000以得到96000,然后将其下采样3采样到32000.另一件事是可以将这些过程链接在一起.因此,如果您想从48000-> 16000上升,您将上升3,下降2,下降2.而且,44100特别困难.例如,要从48000-> 44100转变,您需要上升147,下降160,并且不能细分为较小的条件.

Rate changing usually involves some combination of decimation and interpolation - since both work by an integral numbers of samples. Take 48000->32000 as an example. The output/input ratio is 32000/48000 or 2/3. So you'd upsample 48000 by 2 to get 96000 and then downsample that by 3 to 32000. Another thing is that you can chain these processes together. So if you want to go from 48000->16000 you'd go up 3, down 2, down 2. Also, 44100 is particularly difficult. For example to move from 48000->44100 you need to go up 147, down 160 and you can't break it down to smaller terms.

我建议您找到一些代码或库来帮助您.您需要寻找的是多相滤波器或采样率转换器.

I'd suggest you find some code or a library to do this for you. What you need to look for is a polyphase filter or sample rate converter.

这篇关于将PCM音频从44100下采样到8000的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆