如何使用从浏览器发送到Nodejs服务器的Blob进行Google语音文本转换 [英] How to Google Speech-to-Text using Blob sent from Browser to Nodejs Server

查看:130
本文介绍了如何使用从浏览器发送到Nodejs服务器的Blob进行Google语音文本转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将服务器设置为使用 SocketIO 从客户端浏览器接收音频,然后通过Google Speech-to-Text处理它,最后用文本回复给客户端.

I am trying to set up a server to receive audio from a client browser using SocketIO, then process it through Google Speech-to-Text, and finally reply back to the client with the text.

最初,理想情况下,我想设置为类似于此页面上的工具的功能:https://cloud.google.com/speech-to-text/

Originally and ideally, I wanted to set up to function somewhat like the tool on this page: https://cloud.google.com/speech-to-text/

我尝试使用 getUserMedia 并将其通过 SocketIO-Stream 进行流传输,但是我不知道如何管道" MediaStream .

I tried using getUserMedia and streaming it through SocketIO-Stream, but I couldn't figure out how to 'pipe' MediaStream.

相反,现在我决定在客户端使用 MediaRecorder ,然后将数据作为Blob一起发送(在此

Instead, now I've decided to use MediaRecorder on the client side, and then send the data altogether as a blob(seen in this example).

然后我将 toString('base64')应用于Blob,并在Blob上调用google-cloud/speech的 client.recognize().

I then apply toString('base64') to the blob and call google-cloud/speech's client.recognize() on the blob.

客户端(我正在使用VueJS)

Client Side(i'm using VueJS):

        new Vue({
            el: '#app',
            data: function () {
                return ({
                    msgs: [],
                    socket: null,
                    recorder: null,
                    : []
                })
            },
            mounted: function () {
                this.socket = io.connect('localhost:3000/user');
                console.log('Connected!')
                this.socket.on('text', function (text) {
                    this.msgs.push(text)
                })
            },
            methods: {
                startRecording: function () {
                    if (this.recorder && this.recorder.state == 'recording') {
                        console.log("Stopping!")
                        this.recorder.stop()
                    } else {
                        console.log("Starting!")
                        navigator.mediaDevices.getUserMedia({ audio: true, video: false })
                            .then(this.handleSuccess);
                    }
                },
                handleSuccess: function (stream) {
                    this.recorder = new MediaRecorder(stream)
                    this.recorder.start(10000)
                    this.recorder.ondataavailable = (e) => {
                        this.chunks.push(e.data)
                        console.log(e.data)
                    }
                    this.recorder.onstop = (e) => {
                        const blob = new Blob(this.chunks, { 'type': 'audio/webm; codecs=opus' })
                        this.socket.emit('audio', blob)
                    }
                }
            }
        })

服务器端:

const speech = require('@google-cloud/speech');
const client = new speech.SpeechClient();

const io = require('socket.io').listen(3000)
const ss = require('socket.io-stream')

const encoding = 'LINEAR16';
const sampleRateHertz = 16000;
const languageCode = 'en-US';

const audio = {
    content: null
}

const config = {
    encoding: encoding,
    sampleRateHertz: sampleRateHertz,
    languageCode: languageCode,
}

async function main() {
    const [response] = await client.recognize({
        audio: audio,
        config: config
    })
    const transcription = response.results
        .map(result => result.alternatives[0].transcript)
        .join('\n');
    console.log(`Transcription: ${transcription}`);
}

io.of('/user').on('connection', function (socket) {
    console.log('Connection made!')
    socket.on('audio', function (data) {
        audio.content = data.toString('base64')
        main().catch(console.error)
    });
});




服务器端 main()函数中的日志始终为:

The log from the main() function in the Server side is always:

转录:"

-这是空的!

它应该包含发送的音频中的文本.预先谢谢你!

It should contain the text from the audio sent. Thank you in advance!

推荐答案

您的nodejs应用程序要求处理原始音频数据,这些数据记录为16位带符号整数('LINEAR16')的速率为每秒16k个样本( 16000 ).由于丢失的原因,这种音频表示形式称为脉冲编码调制(PCM)在古代电话知识中.

Your nodejs application asks for the processing of raw audio data, recorded as an array of 16-bit signed integers ('LINEAR16') at a rate if 16k samples/sec (16000) . This sort of audio representation is known as pulse-code modulation (PCM) for reasons lost in ancient telephony lore.

但是您从客户端代码发送的Blob并非如此.这是内容类型为 audio/webm的媒体对象;codecs = opus .这意味着使用 Opus编解码器

But the Blob you send from your client-side code is not that. It's a media object with the content-type audio/webm; codecs=opus. That means the audio track is compressed using the Opus codec and boxed (multiplexed) in the webm (Matroska, ebml) container format. The cloud text-to-speech code tries to interpret that as raw audio data, fails, throws up its hands and returns an empty transcription string. It's analogous to trying to view a zip file in a text editor: it's just gibberish.

要使文本语音转换与媒体对象一起使用,必须首先从中提取PCM音频.这是在服务器上安装颈部的一个臭名昭著的痛苦.您必须使用ffmpeg.文字转语音文档中有一个教程.本教程提到了从视频文件中刮除音频.基本上,您的Blob是一个视频文件,其中没有视频轨道,因此可以使用相同的技术.

To get text-to-speech to work with a media object, you have to extract the PCM audio from it first. This is a notorious pain in the neck to set up on a server; you have to use ffmpeg. There's a tutorial on it in the text-to-speech documentation. The tutorial mentions scraping the audio out of video files. Your Blob is, basically, a video file with no video track in it, so the same techniques work.

但是,使用 Web音频API 拦截原始PCM音频数据,并将其发送到您的服务器,或直接从浏览器发送到文本到语音.

But, you'll be much better off returning to your first approach, using the MediaStream browser javascript APIs. In particular, your browser code should use elements of the Web Audio API to intercept the raw PCM audio data and send it to your server or directly from your browser to text-to-speech.

解释所有这些超出了StackOverflow答案的范围.这里有一些提示.如何使用网络音频api获取原始pcm音频?

Explaining all this is way beyond the scope of a StackOverflow answer. Here are some hints. How to use web audio api to get raw pcm audio?

这篇关于如何使用从浏览器发送到Nodejs服务器的Blob进行Google语音文本转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆