Google Cloud Text-to-speech 单词时间戳 [英] Google Cloud Text-to-speech word timestamps

查看:38
本文介绍了Google Cloud Text-to-speech 单词时间戳的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过 Google Cloud 的 text-to-speech API 生成语音,我想在说话时突出显示单词.

I'm generating speech through Google Cloud's text-to-speech API and I'd like to highlight words as they are spoken.

有没有办法获取口语或句子的时间戳?

Is there a way of getting timestamps for spoken words or sentences?

推荐答案

这个问题似乎很受欢迎,所以我想我会分享我最终做了什么.此方法可能仅适用于英语或类似语言.

This question seems to have gotten quite popular so I thought I'd share what I ended up doing. This method will probably only work with English or similar languages.

我首先在任何导致说话中断的标点符号上拆分文本.每个句子"都分别转换为语音.生成的音频文件在最后有看似随机的静音量,需要在加入它们之前将其删除,这可以使用 FFmpeg silencedetect 过滤器来完成.然后,您可以以适当的间隙加入音频文件.近似词时间戳可以在句子中线性插值.

I first split text on any punctuation that causes a break in speaking. Each "sentence" is converted to speech separately. The resulting audio files have a seemingly random amount of silence at the end which needs to be removed before joining them, this can be done with the FFmpeg silencedetect filter. You can then join the audio files with an appropriate gap. Approximate word timestamps can be linearly interpolated within the sentences.

这篇关于Google Cloud Text-to-speech 单词时间戳的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆