有没有一种快速的方法可以在音频文件中找到(不一定识别)人类语音? [英] Is there a fast way to find (not necessarily recognize) human speech in an audio file?
问题描述
我想编写一个自动同步未同步字幕的程序.我想到的解决方案之一是以某种方式算法地找到人类语音并调整它的微妙之处.我发现的 API(Google Speech API、Yandex SpeechKit)与服务器一起工作(这对我来说不是很方便)并且(可能)做了很多不必要的工作来确定到底说了些什么,而我只需要知道某些东西已经说过了.
I want to write a program that automatically syncs unsynced subtitles. One of the solutions I thought of is to somehow algorythmically find human speech and adjust the subtiles to it. The APIs I found (Google Speech API, Yandex SpeechKit) work with servers (which is not very convinient for me) and (probably) do a lot of unnecessary work determining what exactly has been said, while I only need to know that something has been said.
换句话说,我想给它音频文件并得到这样的东西:
In other words, I want to give it the audio file and get something like this:
[(00:12, 00:26), (01:45, 01:49) ... , (25:21, 26:11)]
有没有只找到人类语音并在本地机器上运行的解决方案(最好是在 python 中)?
Is there a solution (preferably in python) that only finds human speech and runs on a local machine?
推荐答案
您尝试执行的操作的技术术语称为 语音活动检测 (VAD).有一个名为 SPEAR 的 Python 库可以做到这一点(除其他外).
The technical term for what you are trying to do is called Voice Activity Detection (VAD). There is a python library called SPEAR that does it (among other things).
这篇关于有没有一种快速的方法可以在音频文件中找到(不一定识别)人类语音?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!