Google Speech API流音频超过1分钟 [英] Google Speech API streaming audio exceeding 1 minute
问题描述
我希望能够从电话音频流中提取一个人的话语.电话音频被路由到我的服务器,然后我的服务器创建一个流识别请求.如何判断一个单词是完整发音中的一部分还是当前正在转录中的一部分?我应该比较单词之间的时间戳吗?即使流电话音频中在一定时间内没有语音,API仍会继续返回临时结果吗?如何超过1分钟的流音频限制?
I would like to be able to extract utternaces of a person from a stream of telephone audio. The phone audio is routed to my server which then creates a streaming recognition request. How can I tell when a word exists as part of a complete utterance or is part of an utterance currently being transcribed? Should I compare timestamps between words? Will the API continue to return interim results even if there is no speech for a certain amount of time in the streaming phone audio? How can I exceed the 1-minute of streaming audio limit?
推荐答案
关于前三个问题:
您不需要比较单词之间的时间戳,可以通过查看
You don’t need to compare timestamps between words, you can tell if a word is part of a complete utterance (final result) by looking at the is_final flag
in the Streaming Recognition Result. If the flag is set to true, the response corresponds to a completed transcription, otherwise, it is an interim result. More on this here.
获得最终结果后,在流式传输新语音之前,不应生成任何临时结果.
Once you get the final results, no interim results should be generated until new utterances are streamed.
Regarding your last question, you can’t exceed the 1 minute limit, you need to send multiple requests instead.
这篇关于Google Speech API流音频超过1分钟的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!