Google Speech API流音频超过1分钟 [英] Google Speech API streaming audio exceeding 1 minute

查看:108
本文介绍了Google Speech API流音频超过1分钟的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望能够从电话音频流中提取一个人的话语.电话音频被路由到我的服务器,然后我的服务器创建一个流识别请求.如何判断一个单词是完整发音中的一部分还是当前正在转录中的一部分?我应该比较单词之间的时间戳吗?即使流电话音频中在一定时间内没有语音,API仍会继续返回临时结果吗?如何超过1分钟的流音频限制?

I would like to be able to extract utternaces of a person from a stream of telephone audio. The phone audio is routed to my server which then creates a streaming recognition request. How can I tell when a word exists as part of a complete utterance or is part of an utterance currently being transcribed? Should I compare timestamps between words? Will the API continue to return interim results even if there is no speech for a certain amount of time in the streaming phone audio? How can I exceed the 1-minute of streaming audio limit?

推荐答案

关于前三个问题:

您不需要比较单词之间的时间戳,可以通过查看is_final flag来判断单词是否是完整话语(最终结果)的一部分. google.com/speech-to-text/docs/reference/rpc/google.cloud.speech.v1#streamingrecognitionresult"rel =" nofollow noreferrer>流式识别结果.如果该标志设置为true,则响应对应于完成的转录,否则,它是一个临时结果.有关此处.

You don’t need to compare timestamps between words, you can tell if a word is part of a complete utterance (final result) by looking at the is_final flag in the Streaming Recognition Result. If the flag is set to true, the response corresponds to a completed transcription, otherwise, it is an interim result. More on this here.

获得最终结果后,在流式传输新语音之前,不应生成任何临时结果.

Once you get the final results, no interim results should be generated until new utterances are streamed.

关于最后一个问题,您不能超过1分钟的限制,您需要发送

Regarding your last question, you can’t exceed the 1 minute limit, you need to send multiple requests instead.

这篇关于Google Speech API流音频超过1分钟的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆