音频文件匹配程序 [英] Audio File Matching Program

查看:103
本文介绍了音频文件匹配程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个程序,以便可以将两个音频文件(例如WAV)作为输入,进行比较,然后吐出一个数字,以告诉您音频文件的相似程度.

如果有人做了这样的事情,知道该怎么做,或者只是有一些想法,请告诉我.任何事情都将不胜感激.

具体问题:哪种语言合适?这有多难做(有多少
几小时)?在哪里可以找到音频库/工具的良好来源?

谢谢!

I''m trying to write a program than can take two audio files (e.g. WAV) as inputs, compare them, and spit out a number that tells you how similar the audio files are.

If someone has done something like this, know how to go about doing it, or just have some ideas, please let me know. Anything will be greatly appreciated.

Specific questions: What language is suitable? How hard is it to do (how many
hours, roughly)? Where can I find a good source of audio library/tools?

Thanks!

推荐答案

由于您已将平台列为iPhone,因此您可以选择激动人心的C,C ++和Objective C编程语言,并且可以很容易地混合和匹配.

这很好,因为大多数开源项目都是用C/C ++编写的.

基本方法将是您期望的:
解码音乐数据(WAV几乎是标头中包含的原始数据)
分析原始数据.

我不知道iPhone的任何解码器,但我在某处读到了 LAME [
Since you have listed your platform as iPhone, you have the exciting choice of C, C++ and Objective C programming languages, which are mostly the same thing and can be mixed and matched pretty easily.

This is good, because most open source projects are written in C/C++.

The basic method would be what you''d expect:
Decode music data (WAV is pretty much raw data enclosed in headers)
Analyse the raw data.

I don''t know of any decoders for the iPhone, but I read somewhere that LAME[^] works, but is not optimised for it.

As for the analysis, it depends on how dynamic you want it to be.
A binary compare would be a bad choice, if you compare 2 songs the same, 1 starts just 0.1s later than the other then they will hardly match.
A better method would be something like beat mapping (at least as a first pass) to find interesting points in the song and try to match them together.

I only provided this answer because no one else has yet. I''m no professional in the field and there may be better ways about doing this.


取决于您对相似"的含义.
二进制比较(即使只有原始数据)也不起作用,因为在不同执行中播放的相同音乐不一定是由相同的波形"产生的.

一种可以给您带来成功的方法是尝试猜测合适的周期(低通滤波器可以帮助您识别拍"),然后,对于每个周期,估计频谱,找到同源频谱之间的相关函数,并找出多少它不同于冲动"功能.

那不是简单的数学"(谷歌可能会为您提供帮助,但是需要很多理论知识).

另一种经验方法(基于"zip"算法基于周期图的事实)可以如下:
保留"a.wav"并压缩(给出a.zip)
保留"b.wav"并将其压缩(为b.zip增亮)

现在将a.zip和b.zip压缩在一起.
如果生成的zip的长度与所包含的zip的总和相似,则无话可说.
如果生成的zip的长度较短(并且与两个zip-s中的最长zip相似),则两个wav文件可能相似.

(诀窍是,如果它们相似,它们将具有相似的压缩模式,当压缩在一起时,它们将合并,否则将保持不同,并一个接一个地放置).


哦,我忘了提到真正的问题是算法的复杂性(不仅在数字上……).
那时的编程语言只是一个细节.
(C可能是最好的数字运算器")
[/EDIT]
Depends on what do you mean with "similar".
A binary compare (even with the only raw data) doesn''t work, because a same music played in different executions is not necessarily made by the same identical "waves".

A method that can give you some success is try to guess suitable periods (a low pass filter can help you to identify a "beat") then, for each period estimate the spectrum, find the correlation function between the homologous spectrum and find how much it is different form an "impulse" function.

That''s not "easy math" (google may help you, but lot of theory is needed).

Another empirical method (based on the fact that the "zip" algorithm is based on periodograms)can be the following:
keep "a.wav" and zip it (giving a.zip)
keep "b.wav" and zip it (gicing b.zip)

now zip a.zip and b.zip together.
If the length of the resulting zip is similar to the sum of the included zip nothing can be said.
If the length of the resulting zip is shorter (and similar to the longest of the two composing zip-s) than the two wav files are probably similar.

(The trick is that, if they are similar they will have similar compression patterns, that -when compressed together- will merge, otherwise will remain distinct, and placed one after the other)


Oh, I forgot to mention that the real problem is the complexity (not only in the sense on numbers ...) of the algorithms.
The programming language, at that point, is just a detail.
(C will probably be the best "number cruncher")
[/EDIT]


免责声明:这不是一个答案,因为不可能给出任何明确的答案.

我的答案是整个研究计划的(非常初步的)草案:

1)将所有文件转换为相同的原始格式(请参阅安德鲁的回答).
2)为音频的任意片段开发傅里叶分析,请参见
http://en.wikipedia.org/wiki/Fourier_analysis [^ ],http://en.wikipedia.org/wiki/Fast_Fourier_transform [模糊集或遗传编程或类似图像识别中使用的东西.
5)您已标记化音频片段.尝试匹配来自不同音频流的片段,并用权重为良好的匹配得分.
6)比较分数,得出结果.
7)制定可选标准.一个标准可能会将更多的精力放在高频上,另一个标准则放在速度/持续时间上,等等.

即使在最成功的情况下,我也无法期望音频表演获得良好的效果,例如,不同乐器或不同歌手演奏的同一音乐作品.

即使您获得的成功有限,也要为获得计算机科学重大奖项做好准备. :)

现在,在现实生活中,这是一个非常重要的问题.我记得在波士顿克雷格的名单上的公告.一位病人"向愿意整理大量音乐唱片,消除重复(重要!),合并标签,描述等的任何人提供了可观的费用.他不关心手动或通过编程,但希望技术的发展(天真的希望,但是...).现在成像您喜欢的音乐(就像我一样)-这意味着在聆听很多...好吧,不同的唱片时会遭受真正的折磨.如果您不在乎,很可能您将无法识别这些音调...有意义,对吧?



请关注Espen Harlinn的答案":他能够比我做得更深入.
另请参阅我对该答案的评论.
Disclaimer: this is not an answer, because it is impossible to give any definitive answer.

My answer is a (very tentative) draft for a whole research program:

1) Develop conversion of all files into identical raw format (see answer by Andrew).
2) Develop Fourier analysis for a arbitrary fragment of audio, see http://en.wikipedia.org/wiki/Fourier_analysis[^], http://en.wikipedia.org/wiki/Fast_Fourier_transform[^].
3) Develop comparison criteria for Fourier images with weights such as length of the fragment.
4) Develop "tokenization" if the whole audio piece: a way to break up all audio stream into several "distinct" fragments; with the requirement that the sub-fragments withing a "token" fragments would be relatively "close" compared to the nearby fragments. Attention! This is most difficult part. Prepare to learn fuzzy sets or genetic programming or something like that used in image recognition.
5) You have tokenized audio fragments. Try to match fragments from different audio streams, score good matches with weights.
6) Compare the scores, present result.
7) Develop optional criteria. One criterion may put more weight on high frequencies, another on tempo/duration, etc.

Even in most successful case, I am not expecting good result for audio performing, for example, same music opus played by different instruments, or, say, different singers.

Even if you get limited success, prepare yourself for major awards in computer science. :)

Now, in real life this is a pretty important problem. I remember announcement on Boston Craig''s List. A "patient" offered considerable fee to anyone who would sort tons of his music records, with elimination of duplicates (important!), merging tags, descriptions, etc. He did not care manually or through programming but hoped for development of the technology (pretty naive hope, but...). Now imaging you love music (like I do) -- that would mean a real torture while listening a lot of... well, different records. If you don''t care, chances are you would fail to recognize the tunes... Makes sense, right?



Please pay attention for the Answer by Espen Harlinn: he was able to go much deeper then I did.
See also my comment to that Answer.


这篇关于音频文件匹配程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆