如何将原生html5录制的音频的float32Array格式转换为Google语音到文本服务的正确字节? [英] How to convert the float32Array format of native html5 recorded audio to proper bytes for Google Speech-to-Text service?

查看:576
本文介绍了如何将原生html5录制的音频的float32Array格式转换为Google语音到文本服务的正确字节?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果您遵循本教程: https://medium.com/ideas-at-igenius/delivering-a-smooth-cross-browser-speech-to-text-experience-b1e1f1f194a2 您将设法创建一个脚本处理器添加侦听器

If you follow this tutorial: https://medium.com/ideas-at-igenius/delivering-a-smooth-cross-browser-speech-to-text-experience-b1e1f1f194a2 you will manage to create a script processor to which you add a listener

scriptProcessor = inputPoint.context.createScriptProcessor(bufferSize, in_channels, out_channels)
//...
scriptProcessor.addEventListener('audioprocess', streamAudioData)

在回调中调用这一行: callback_param.inputBuffer.getChannelData(0)一个人收到一个javascript Float32Array,它通过查看数据似乎包含从-1.0到+1.0的浮点数

Inside the callback by calling this line: callback_param.inputBuffer.getChannelData(0) one receives a javascript Float32Array which by looking at the data seems to contain float numbers from -1.0 to +1.0

因此将此流式传输到后端,后端又将其流式传输到Google语音转文本服务,您无法获得任何结果(正如预期的那样)

Therefore streaming this to the backend which in turn streams it to Google Speech-To-Text service you are getting nothing (as expected)

Goo gle语音到文本服务,至少在Python中,用于流输入需要一个wav格式的字节串,其中包含指定速率的声音(即16000HZ)。请注意,如果在后端你流式传输一个文件,这工作正常。

Google Speech-To-Text service, at least in Python, for streaming input expects a byte-string in a wav format which contains the sound in the rate that it was specified (i.e. 16000Hz). Note that if in the backend you stream it a file this is working ok.

此转换失败:Float32Array - > Int16Array - > byte-string

This conversion has failed: Float32Array -> Int16Array -> byte-string

有没有人找到上述工作的适当转换?

Has anyone find what are the appropriate conversions for the above to work ?

或者你知道一个更简单,更健壮的路径:浏览器中的麦克风 - >通过websocket将数据流传输到后端服务器 - >将数据流传输到Google语音转输服务 - >按预期获得响应?

Alternatively are you aware of a simpler more robust path for: Microphone in browser -> stream data via websocket to backend server -> stream data to Google Speech-To-Input service -> get responses as expected ?

编辑:为Google Speech api的识别配置添加python代码

Adding python code for Recognition Config of Google speech api

config = types.RecognitionConfig(
        encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=16000,
        language_code=self.language_code)


推荐答案

好的,做了一些挖掘,找到了实际文档,其中包含正确的信息。

Ok, did some digging, found the actual documentation which has the proper information.


LINEAR16 - 未压缩的16位带符号小端样本(线性PCM)。

LINEAR16 - Uncompressed 16-bit signed little-endian samples (Linear PCM).

关键部分是:


  • 每个样本16位

  • 已签名

  • Little-endian

因此,您需要做的是缩放浮点值( -1.0 .. .1.0 )到 -32786 32767 之间的整数。

So, what you need to do is scale your floating point values (-1.0 ... 1.0) to integers between -32786 and 32767.

没有任何内置的JavaScript方法可以帮到你。您在Float32Array和Int16Array之间的转换不起作用,因为您最终会得到的值接近 -1 0 ,和 1 。您无法使用Int16Array的另一个原因是因为它是 endianness依赖于平台

There isn't any built-in JavaScript method to do this for you. Your conversions between Float32Array and Int16Array don't work because you'll just end up with values approximating -1, 0, and 1. The other reason you can't use Int16Array is because it's endianness is platform dependent!

你需要做的就是熟悉ArrayBuffers并用 DataView 。取每个样本,做一些数学运算,写入字节,移动到下一个样本。完成后,XHR和Fetch API都支持将ArrayBuffer作为HTTP请求体发送。或者,您可以使用该ArrayBuffer实例化一个新Blob并使用它执行其他操作。

What you need to do is get cozy with ArrayBuffers and manipulate them with a DataView. Take each sample, do some math, write the bytes, move to the next sample. When you're done, both XHR and the Fetch API support sending an ArrayBuffer as the HTTP request body. Or, you can instantiate a new Blob with that ArrayBuffer and do other things with it.

这篇关于如何将原生html5录制的音频的float32Array格式转换为Google语音到文本服务的正确字节?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆