比较两个谱图以找到它们匹配算法的偏移 [英] Compare two spectogram to find the offset where they match algorithm

查看:240
本文介绍了比较两个谱图以找到它们匹配算法的偏移的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我记录每天2分钟的无线电广播从互联网。总是有相同的开始和结束的叮当声。由于无线电广播的精确时间可能从大约6分钟变化,我必须记录大约15分钟的广播。



我希望识别15分钟记录中那些吉他的确切时间,因此我可以提取我想要的音频部分。



我已经开始一个C#应用程序,我将MP3解码为PCM数据,并将PCM数据转换为基于。您可以通过编目开始序列的每个步骤开始。假设您使用10 ms步长,打开序列长度为50 ms。您计算每个步骤的一些指标并获取

  1 10 1 17 5 

现在查看您的音频,并对同一个指标分析每10ms步。调用此数组 have_audio

  8 10 8 7 5 1 10 1 17 6 2 10 ... 

现在创建一个新的空数组,长度与 have_audio 。调用 start_votes 。它将包含投票开始序列的开始。如果你看到一个1,你可能在开头序列的第1或第3步,所以你有1投票开始顺序开始1步前面和1投票开始序列开始3步前。如果你看到一个10,你有1投票开始顺序开始2步前,一个17投票4步前,等等。



have_audio ,您的投票将如下所示:

  2 0 0 1 0 4 0 0 0 0 0 1 ... 

你在第6位有很多票,所以很有可能从那里开始打开序列。



你可以通过不打扰分析整个开放序列来提高性能。如果开场序列长度为10秒,您可以只搜索前5秒。


I record a daily 2 minutes radio broadcast from Internet. There's always the same starting and ending jingle. Since the radio broadcast exact time may vary from more or less 6 minutes I have to record around 15 minutes of radio.

I wish to identify the exact time where those jingles are in the 15 minutes record, so I can extract the portion of audio I want.

I already started a C# application where I decode an MP3 to PCM data and convert the PCM data to a spectrogram based on http://www.codeproject.com/KB/audio-video/SoundCatcher.aspx

I tried to use a Cross Correlation algorithm on the PCM data but the algorithm is very slow around 6 minutes with a step of 10ms and is some occasion it fail to find the jingle start time.

Any ideas of algorithms to compare two spectrogram for match? Or a better way to find that jingle start time?

Thanks,

Update, sorry for the delay

First, thank for all the anwsers most of them were relevent and or interresting ideas.

I tried to implement the Shazam algorithm proposed by fonzo. But failed to detect the peaks in the spectrogram. Here's three spectrograms of the starting jingle from three different records. I tried AForge.NET with the blob filter (but it failed to identify peaks), to blur the image and check for difference in height, the Laplace convolution, slope analysis, to detect the series of vertical bars (but there was too many false positive)...

In the mean while, I tried the Hough algorithm proposed by Dave Aaron Smith. Where I calculate the RMS of each columns. Yes yes each columns, it's a O(N*M) but M << N (Notice a column is around 8k of sample). So in the overall it's not that bad, still the algorithm take about 3 minutes, but has never fail.

I could go with that solution, but if possible, I would prefer the Shazam cause it's O(N) and probably much faster (and cooler also). So does any of you have an idea of an algorithm to always detect the same points in those spectrograms (doesn't have to be peaks), thanks to add a comment.

New Update

Finally, I went with the algorithm explained above, I tried to implement the Shazam algorithm, but failed to find proper peaks in the spectrogram, the identified points where not constant from one sound file to another. In theory, the Shazam algorithm is the solution for that kind of problem. The Hough algorithm proposed by Dave Aaron Smith was more stable and effective. I split around 400 files, and only 20 of them fail to split properly. Disk space when from 8GB to 1GB.

Thanks, for your help.

解决方案

I wonder if you could use a Hough transform. You would start by cataloging each step of the opening sequence. Let's say you use 10 ms steps and the opening sequence is 50 ms long. You compute some metric on each step and get

1 10 1 17 5

Now go through your audio and analyze each 10 ms step for the same metric. Call this array have_audio

8 10 8 7 5 1 10 1 17 6 2 10...

Now create a new empty array that's the same length as have_audio. Call it start_votes. It will contain "votes" for the start of the opening sequence. If you see a 1, you may be in the 1st or 3rd step of the opening sequence, so you have 1 vote for the opening sequence starting 1 step ago and 1 vote for the opening sequence starting 3 steps ago. If you see a 10, you have 1 vote for the opening sequence starting 2 steps ago, a 17 votes for 4 step ago, and so on.

So for that example have_audio, your votes will look like

2 0 0 1 0 4 0 0 0 0 0 1 ...

You have a lot of votes at position 6, so there's a good chance the opening sequence starts there.

You could improve performance by not bothering to analyze the entire opening sequence. If the opening sequence is 10 seconds long, you could just search for the first 5 seconds.

这篇关于比较两个谱图以找到它们匹配算法的偏移的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆