如何加快谷歌云语音 [英] how to speed up google cloud speech

查看:123
本文介绍了如何加快谷歌云语音的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用麦克风,该麦克风通过浏览器记录声音,将其转换为文件,然后将文件发送到Java服务器.然后,我的Java服务器将文件发送到云语音API,并给我转录.问题在于转录超长(对话2秒的时间约为3.7秒).

I am using a microphone which records sound through a browser, converts it into a file and sends the file to a java server. Then, my java server sends the file to the cloud speech api and gives me the transcription. The problem is that the transcription is super long (around 3.7sec for 2sec of dialog).

所以我想加快转录速度.要做的第一件事是流式传输数据(如果我在记录的开头开始转录.问题是我不太了解api.例如,如果我想从源中转录音频流(浏览器/麦克风),我需要使用某种JS api,但找不到在浏览器中可以使用的任何东西(我们不能像这样使用node吗?).

So I would like to speed up the transcription. The first thing to do is to stream the data (if I start the transcription at the beginning of the record. The problem is that I don't really understand the api. For instance if I want to transcript my audio stream from the source (browser/microphone) I need to use some kind of JS api, but I can't find anything I can use in a browser (we can't use node like this can we?).

否则,我需要将数据从我的js流到我的java(不确定如何在不破坏数据的情况下...),然后从那里将其通过streamingRecognizeFile推送:

Else I need to stream my data from my js to my java (not sure how to do it without breaking the data...) and then push it through streamingRecognizeFile from there : https://github.com/GoogleCloudPlatform/java-docs-samples/blob/master/speech/cloud-client/src/main/java/com/example/speech/Recognize.java

但是它将文件作为输入,那么我应该如何使用它呢?我真的不能告诉系统我是否完成了记录...它将如何理解它是转录的结束?

But it takes a file as the input, so how am I supposed to use it? I cannot really tell the system I finished or not the record... How will it understand it is the end of the transcription?

我想在网络浏览器中创建一些内容,就像那里的google demo一样: https://cloud.google.com/speech/

I would like to create something in my web browser just like the google demo there : https://cloud.google.com/speech/

我认为关于流式API的使用方法,我不了解一些基本知识.如果有人可以解释一下我应该如何处理,那就太麻烦了.

I think there is some fundamental stuff I do not understand about the way to use the streaming api. If someone can explain a bit how I should process about this, it would be owesome.

谢谢.

推荐答案

Google语音转换通常比实时处理音频要快,平均30秒处理30秒的音频"

Google "Speech-to-Text typically processes audio faster than real-time, processing 30 seconds of audio in 15 seconds on average" [1]. You can use Google APIs Explorer to test exactly how long your each request would take [2].

为加快转录速度,您可以尝试向请求添加识别元数据 [4] .或使用增强模型来使用一组特殊的机器学习模型 [5] .所有这些建议将提高准确性,并可能影响转录速度.

To speed up the transcribing you may try to add recognition metadata to your request [3]. You can provide phrase hints if you are aware of the context of the speech [4]. Or use enhanced models to use special set of machine learning models [5]. All these suggestions would improve the accuracy and might have effects on transcribing speed.

使用流识别时,可以在配置中将singleUtterance选项设置为True.这将检测用户是否暂停讲话并停止识别.如果没有,流请求将继续到内容限制,即流请求的音频长度的1分钟

When using the streaming recognition, in config you can set singleUtterance option to True. This will detect if user pause speaking and cease the recognition. If not streaming request will continue until to the content limit, which is 1 minute of audio length for streaming request [6].

这篇关于如何加快谷歌云语音的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆