比较两个spectogram找到抵消他们匹配算法 [英] Compare two spectogram to find the offset where they match algorithm

查看:172
本文介绍了比较两个spectogram找到抵消他们匹配算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我记录从互联网每天2分钟无线电广播。总有相同的起始和结束的顺口溜。由于无线电广播确切的时间可能或多或少6分钟改变我必须记录大约15分钟广播。

I record a daily 2 minutes radio broadcast from Internet. There's always the same starting and ending jingle. Since the radio broadcast exact time may vary from more or less 6 minutes I have to record around 15 minutes of radio.

我要确定确切的时间,其中的顺口溜是15分钟时限的记录,这样我就可以提取音频我想要的部分。

I wish to identify the exact time where those jingles are in the 15 minutes record, so I can extract the portion of audio I want.

我已经开始了一个C#应用程序,我去code的MP3播放PCM数据和PCM数据转换为根据的 HTTP://www.$c$cproject.com/KB/audio-video/SoundCatcher.aspx

I already started a C# application where I decode an MP3 to PCM data and convert the PCM data to a spectrogram based on http://www.codeproject.com/KB/audio-video/SoundCatcher.aspx

我试图用一个交叉相关算法的PCM数据,但该算法是非常缓慢与10ms的一步大约6分钟,是有些场合是无法找到的顺口溜开始时间。

I tried to use a Cross Correlation algorithm on the PCM data but the algorithm is very slow around 6 minutes with a step of 10ms and is some occasion it fail to find the jingle start time.

算法任何想法来比较两个频谱的比赛?或者找到更好的方式,叮当启动时间?

Any ideas of algorithms to compare two spectrogram for match? Or a better way to find that jingle start time?

谢谢

更新,抱歉耽搁

首先,感谢所有的大多是相应和的anwsers和或interresting想法。

First, thank for all the anwsers most of them were relevent and or interresting ideas.

我试图执行提出fonzo的Shazam的算法。但未能检测在频谱的峰值。下面是从三个不同的记录三张谱图开始顺口溜。我试着AForge.NET的BLOB过滤器(但它未能确定峰),模糊图像,并为您在高,拉普拉斯卷积,坡度分析,差异来检测一系列垂直条(但有太多的假正)...

I tried to implement the Shazam algorithm proposed by fonzo. But failed to detect the peaks in the spectrogram. Here's three spectrograms of the starting jingle from three different records. I tried AForge.NET with the blob filter (but it failed to identify peaks), to blur the image and check for difference in height, the Laplace convolution, slope analysis, to detect the series of vertical bars (but there was too many false positive)...

在这其间,我试图提出由Dave亚伦·史密斯霍夫算法。当我计算每列的RMS。是的是的每个列,这是一个O(N * M),但M<< N(注意列是样品约8K)。所以在总体来说并不坏,仍然算法需要大约3分钟,但从来没有失败。

In the mean while, I tried the Hough algorithm proposed by Dave Aaron Smith. Where I calculate the RMS of each columns. Yes yes each columns, it's a O(N*M) but M << N (Notice a column is around 8k of sample). So in the overall it's not that bad, still the algorithm take about 3 minutes, but has never fail.

我可以与解决方案去,但如果可能的话,我会preFER的Shazam的原因是O(N),并可能更快(和冷却器也)。那么,研究你的算法一直检测同分在那些谱图的想法(不一定是峰值),由于添加注释。

I could go with that solution, but if possible, I would prefer the Shazam cause it's O(N) and probably much faster (and cooler also). So does any of you have an idea of an algorithm to always detect the same points in those spectrograms (doesn't have to be peaks), thanks to add a comment.

新更新

最后,我去的算法上面解释的,我试图实现Shazam的算法,但未能发现在频谱适当的山峰,标识的点在哪里不是从一个声音文件到另一个不变。从理论上讲,沙札姆算法为那种问题的解决方案。建议由戴维·艾伦·史密斯的霍夫算法更稳定和有效的。我分裂约400文件,其中只有20无法分割正常。当磁盘空间从8GB到1GB。

Finally, I went with the algorithm explained above, I tried to implement the Shazam algorithm, but failed to find proper peaks in the spectrogram, the identified points where not constant from one sound file to another. In theory, the Shazam algorithm is the solution for that kind of problem. The Hough algorithm proposed by Dave Aaron Smith was more stable and effective. I split around 400 files, and only 20 of them fail to split properly. Disk space when from 8GB to 1GB.

谢谢您的帮助。

推荐答案

我不知道你是否可以使用霍夫变换。你会通过编目次第开放的每一步开始。比方说,你用10毫秒的步骤和开放顺序是50毫秒长。你计算一些指标上的每一步,并获得

I wonder if you could use a Hough transform. You would start by cataloging each step of the opening sequence. Let's say you use 10 ms steps and the opening sequence is 50 ms long. You compute some metric on each step and get

1 10 1 17 5

现在通过你的声音和分析同度量每10毫秒的一步。调用此阵 have_audio

Now go through your audio and analyze each 10 ms step for the same metric. Call this array have_audio

8 10 8 7 5 1 10 1 17 6 2 10...

现在创建一个新的空数组的长度相同 have_audio 。说它 start_votes 。它将包含内容的开幕序列的开始。如果你看到一个1,你可能在1日或开启顺序的第三步,让你拥有1票的片头开始1步前,1票的片头开始3个步骤前。如果你看到一个10,你有1票的片头开始2步前,17票对4步前,等等。

Now create a new empty array that's the same length as have_audio. Call it start_votes. It will contain "votes" for the start of the opening sequence. If you see a 1, you may be in the 1st or 3rd step of the opening sequence, so you have 1 vote for the opening sequence starting 1 step ago and 1 vote for the opening sequence starting 3 steps ago. If you see a 10, you have 1 vote for the opening sequence starting 2 steps ago, a 17 votes for 4 step ago, and so on.

在这样的情况例如 have_audio ,您的看起来像

So for that example have_audio, your votes will look like

2 0 0 1 0 4 0 0 0 0 0 1 ...

您有很多票在第6位,所以这是一个很好的机会打开顺序从那里开始。

You have a lot of votes at position 6, so there's a good chance the opening sequence starts there.

您可以通过没有打扰到分析整个打开顺序提高性能。如果片头长为10秒,你可以只搜索前5秒。

You could improve performance by not bothering to analyze the entire opening sequence. If the opening sequence is 10 seconds long, you could just search for the first 5 seconds.

这篇关于比较两个spectogram找到抵消他们匹配算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆