如何使用columnTruncateLength:232和numFramesPerSpectrogram:43将tensorflowjs的wav文件转换为频谱图? [英] How to convert wav file to spectrogram for tensorflowjs with columnTruncateLength: 232 and numFramesPerSpectrogram: 43?

查看:59
本文介绍了如何使用columnTruncateLength:232和numFramesPerSpectrogram:43将tensorflowjs的wav文件转换为频谱图?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在离线模式下使用tensorflowjs语音识别.使用麦克风的在线模式工作正常.但是对于离线模式,我无法找到任何可靠的库,无法根据所需的数组规格将wav/mp3文件转换为频谱图ffttsize:1024,columnTruncateLength:232,numFramesPerSpectrogram:43.

i'm trying to use tensorflowjs speech recognition in offline mode. online mode using microphone is working fine. but for offline mode i'm not able to find any reliable library for converting wav/mp3 file to spectrogram according to the required specs of array as ffttsize:1024 , columnTruncateLength: 232, numFramesPerSpectrogram: 43.

我尝试过的所有像spectrogram.js之类的库都没有这些conversin选项.而tensorlfowjs演讲中明确提到光谱仪张量具有以下规格

All libraries like spectrogram.js that i tried dont have those conversin options. while tensorlfowjs speech clearly mentions to have following specs for spectrograph tensor

const mic = await tf.data.microphone({
  fftSize: 1024,
  columnTruncateLength: 232,
  numFramesPerSpectrogram: 43,
  sampleRateHz:44100,
  includeSpectrogram: true,
  includeWaveform: true
});

将错误表示为错误:当 values 是紧随其后的平面数组时,tensor4d()要求提供形状

Getting error as Error: tensor4d() requires shape to be provided when values are a flat array in following

await recognizer.ensureModelLoaded();
    var audiocaptcha = await response.buffer();
    fs.writeFile("./afterverify.mp3", audiocaptcha, function (err) {
        if (err) {}
    });
    var bufferNewSamples =  new Float32Array(audiocaptcha);

    const buffersliced = bufferNewSamples.slice(0,bufferNewSamples .length-(bufferNewSamples .length%9976));
    const xtensor = tf.tensor(bufferNewSamples).reshape([-1, 
...recognizer.modelInputShape().slice(1)]);

切片并校正张量后得到此错误

got this error after slicing and correcting to tensor

output.scores
[ Float32Array [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0 ],
  Float32Array [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0 ],
  Float32Array [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0 ],
  Float32Array [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0 ],
  Float32Array [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0 ] ]
score for word '_background_noise_' = 0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
score for word '_unknown_' = 0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
score for word 'down' = 0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
score for word 'eight' = 0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
score for word 'five' = 0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
score for word 'four' = undefined
score for word 'go' = undefined
score for word 'left' = undefined
score for word 'nine' = undefined
score for word 'no' = undefined
score for word 'one' = undefined
score for word 'right' = undefined
score for word 'seven' = undefined
score for word 'six' = undefined
score for word 'stop' = undefined
score for word 'three' = undefined
score for word 'two' = undefined
score for word 'up' = undefined
score for word 'yes' = undefined
score for word 'zero' = undefined

推荐答案

使用脱机识别时,唯一的要求是输入形状为 [null,43,232,1] 的输入张量.

The only requirement when working with offline recognition is to have an input tensor of shape [null, 43, 232, 1].

1-读取wav文件并获取数据数组

1 - Read the wav file and get the array of data

var spectrogram = require('spectrogram');

var spectro = Spectrogram(document.getElementById('canvas'), {
  audio: {
    enable: false
  }
});

var audioContext = new AudioContext();

readWavFile() {
return new Promise(resove => {
var request = new XMLHttpRequest();
request.open('GET', 'audio.mp3', true);
request.responseType = 'arraybuffer';

request.onload = function() {
  audioContext.decodeAudioData(request.response, function(buffer) {
    resolve(buffer)
  });
};
request.send()
})

}

const buffer = await readWavFile()

无需使用第三方库就可以完成相同的操作.可能有2种选择.

The same thing can be done without using the third party library. 2 options are possible.

  • 使用< input type ="file"> 读取文件.在这种情况下,此 answer 显示了如何获取typedarray.

  • Read the file using <input type="file">. In that case, this answer shows how to get the typedarray.

使用http请求提供和读取wav文件

Serve and read the wav file using a http request

var req = new XMLHttpRequest();
req.open("GET", "file.wav", true);
req.responseType = "arraybuffer";

req.onload = function () {
  var arrayBuffer = req.response;
  if (arrayBuffer) {
    var byteArray = new Float32Array(arrayBuffer);
  }
};

req.send(null);

2-将缓冲区转换为typedarray

2- convert the buffer to typedarray

const data = Float32Array(buffer)

3-使用语音识别模型的形状将数组转换为张量

3- convert the array to a tensor using the shape of the speech recognition model

const x = tf.tensor(
   data).reshape([-1, ...recognizer.modelInputShape().slice(1));

如果后面的命令失败,则表示数据不具有模型所需的形状.需要将张量切成合适的形状,或者记录时应遵守 fft 和其他参数.

If the latter commands fails, it means that the data does not have the shape needed for the model. The tensor needs to be sliced to have the appropriate shape or the recording made should respect the fft and other parameters.

这篇关于如何使用columnTruncateLength:232和numFramesPerSpectrogram:43将tensorflowjs的wav文件转换为频谱图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆