Google语音识别API:每个单词的时间戳? [英] Google Speech Recognition API: timestamp for each word?
问题描述
可以通过向http://www.google.com/speech-api/v2/recognize?...
示例:我在WAV文件中说过"二三为五". Google API为我提供了这一点:
Example: I have said "one two three for five" in a WAV file. Google API gives me this:
{
u'alternative':
[
{u'transcript': u'12345'},
{u'transcript': u'1 2 3 4 5'},
{u'transcript': u'one two three four five'}
],
u'final': True
}
问题:能否获得每个单词被说出的时间(以秒为单位)?
以我的示例为例:
['one', 0.23, 0.80], ['two', 1.03, 1.45], ['three', 1.79, 2.35], etc.
即在时间00:00:00.23和00:00:00.80之间说了一个"一词,
在时间00:00:01.03和00:00:01.45(以秒为单位)之间说了两个"一词.
i.e. the word "one" has been said between time 00:00:00.23 and 00:00:00.80,
the word "two" has been said between time 00:00:01.03 and 00:00:01.45 (in seconds).
PS:正在寻找一种支持除英语之外的其他语言(尤其是法语)的API.
PS: looking for an API supporting other languages than English, especially French.
推荐答案
我相信其他答案现在已经过时了.现在,使用Google Cloud Search API可以实现: https://cloud.google.com/speech/docs/async-time-offsets
I believe the other answer is now out of date. This is now possible with the Google Cloud Search API: https://cloud.google.com/speech/docs/async-time-offsets
这篇关于Google语音识别API:每个单词的时间戳?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!