如何加速谷歌云语音 [英] how to speed up google cloud speech

查看:28
本文介绍了如何加速谷歌云语音的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的麦克风通过浏览器记录声音,将其转换为文件并将文件发送到 Java 服务器.然后,我的 java 服务器将文件发送到云语音 api 并给我转录.问题是转录超长(2 秒对话约 3.7 秒).

I am using a microphone which records sound through a browser, converts it into a file and sends the file to a java server. Then, my java server sends the file to the cloud speech api and gives me the transcription. The problem is that the transcription is super long (around 3.7sec for 2sec of dialog).

所以我想加快转录速度.要做的第一件事是流式传输数据(如果我在记录的开头开始转录.问题是我不太了解 api.例如,如果我想从源中转录我的音频流(浏览器/麦克风)我需要使用某种 JS api,但我找不到可以在浏览器中使用的任何东西(我们不能像这样使用节点吗?).

So I would like to speed up the transcription. The first thing to do is to stream the data (if I start the transcription at the beginning of the record. The problem is that I don't really understand the api. For instance if I want to transcript my audio stream from the source (browser/microphone) I need to use some kind of JS api, but I can't find anything I can use in a browser (we can't use node like this can we?).

否则我需要将我的数据从我的 js 流式传输到我的 java(不知道如何在不破坏数据的情况下做到这一点......)然后从那里通过 streamingRecognizeFile 推送它:https://github.com/GoogleCloudPlatform/java-docs-samples/blob/master/speech/cloud-client/src/main/java/com/example/speech/Recognize.java

Else I need to stream my data from my js to my java (not sure how to do it without breaking the data...) and then push it through streamingRecognizeFile from there : https://github.com/GoogleCloudPlatform/java-docs-samples/blob/master/speech/cloud-client/src/main/java/com/example/speech/Recognize.java

但是它需要一个文件作为输入,那么我应该如何使用它呢?我真的无法告诉系统我是否完成了记录......它如何理解它是转录结束?

But it takes a file as the input, so how am I supposed to use it? I cannot really tell the system I finished or not the record... How will it understand it is the end of the transcription?

我想在我的网络浏览器中创建一些东西,就像那里的谷歌演示一样:https://cloud.google.com/speech/

I would like to create something in my web browser just like the google demo there : https://cloud.google.com/speech/

我认为关于使用流式 API 的方式有一些基本的东西我不了解.如果有人能解释一下我应该如何处理这个问题,那就太亏了.

I think there is some fundamental stuff I do not understand about the way to use the streaming api. If someone can explain a bit how I should process about this, it would be owesome.

谢谢.

推荐答案

Google Speech-to-Text 通常比实时处理音频更快,平均 15 秒内处理 30 秒音频"[1].您可以使用 Google APIs Explorer 来测试您的每个请求需要多长时间 [2].

Google "Speech-to-Text typically processes audio faster than real-time, processing 30 seconds of audio in 15 seconds on average" [1]. You can use Google APIs Explorer to test exactly how long your each request would take [2].

为了加快转录速度,您可以尝试将识别元数据添加到您的请求中 [3].如果您了解演讲的上下文,您可以提供短语提示 [4].或者使用增强模型来使用一组特殊的机器学习模型 [5].所有这些建议都会提高准确性,并可能对转录速度产生影响.

To speed up the transcribing you may try to add recognition metadata to your request [3]. You can provide phrase hints if you are aware of the context of the speech [4]. Or use enhanced models to use special set of machine learning models [5]. All these suggestions would improve the accuracy and might have effects on transcribing speed.

使用流式识别时,在配置中可以将singleUtterance选项设置为True.这将检测用户是否暂停说话并停止识别.如果不是流请求将继续直到内容限制,这是流请求的 1 分钟音频长度 [6].

When using the streaming recognition, in config you can set singleUtterance option to True. This will detect if user pause speaking and cease the recognition. If not streaming request will continue until to the content limit, which is 1 minute of audio length for streaming request [6].

这篇关于如何加速谷歌云语音的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆