如何搜索内容,音频文件/流中? [英] How do I search content, within audio files/streams?

查看:380
本文介绍了如何搜索内容,音频文件/流中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直想知道有多少不同的搜索技术存在的,用于搜索文本,并甚至视频搜索图像。

I have always wondered how many different search techniques existed, for searching text, for searching images and even for videos.

不过,我从来没有碰到过的搜索音频文件中的内容的解决方案。结果

However, I have never come across a solution that searched for content within audio files.

例如:让我们假设我有大约200播客在MP3,WAV和OGG文件的形式下载到我的电脑。它们都被命名为一般说podcast1.mp3,podcast2.mp3等,所以,这是不可能知道的内容是什么,而不实际听到它们。让我们说,我感兴趣的是查不到,该播客谈游戏编程。我希望结果显示为:

For example: Let us assume that I have about 200 podcasts downloaded to my PC in the form of mp3, wav and ogg files. They are all named generically say podcast1.mp3, podcast2.mp3, etc. So, it is not possible to know what the content is, without actually hearing them. Lets say that, I am interested in finding out, which the podcasts talk about 'game programming'. I want the results to be shown as:


  • Podcast1.mp3 - 3个结果在时间索引(ES) - ○点16分21秒,○时43分45秒,一时12分31秒

  • Podcast21.ogg - 1个结果在时间索引(ES) - 0时12分01秒

所以我的问题:


  • 一个人怎么能解决这个问题?

  • 是否有发展到这样做合适的算法?

一个想法突然出现在我的脑海是,人们可以使用语音到文本软件与时间指标相处成绩单每个音频文件,然后解析成绩单获得的输出。

One idea the cropped up in my mind was that, one could use a 'speech-to-text' software to get transcripts along with time indexes for each of the audio files, then parse the transcript to get the output.

我正在考虑这是我的爱好项目之一。
谢谢!

I was considering this as one of my hobby projects. Thanks!

推荐答案

如果您想搜索文本(也就是你在说什么)的音频流中,你将不得不以某种语音识别算法和存储来处理它文字与文件相关的元数据。对于视频,你也可以做文字识别为视频中的文本。 Evernote的已经这样做的图像文件内的文本,但有没有音频的支持,据我所知。

If you want to search for text (i.e. what is being said) inside an audio stream you would have to process it with some kind of speech recognition algorithm and store the text as meta data associated with the files. For video you could also do text recognition for text inside the video. Evernote already does this for text inside image files, but has no support for audio as far as I know.

类似的东西使用音频搜索音频时是可能的。我不知道这些算法的细节,但我猜它们涉及某种频率分析。 Shazam的是利用技术的这种基于音频剪辑来识别歌曲。

Something similar is possible when using audio to search for audio. I don't know the details of these algorithms, but I'm guessing they involve some kind of frequency analysis. Shazam is using this kind of technology to identify songs based on audio clips.

下面是一些维基百科文章,可能是有用的:

Here are some Wikipedia articles that may be useful:

  • Speech recognition
  • Fast Fourier transform
  • Frequency analysis (frequency spectrum)
  • Optical character recognition (OCR)

这篇关于如何搜索内容,音频文件/流中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆