使用卷积找到一个参考音频样本在声音的连续流 [英] Use convolution to find a reference audio sample in a continuous stream of sound

查看:570
本文介绍了使用卷积找到一个参考音频样本在声音的连续流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在<一个href="http://stackoverflow.com/questions/5843713/find-audio-sample-in-audio-file-spectrogram-already-exists">my在寻找一个更大的音频采样参考音频采样previous问题,有人提议,我应该使用卷积。
使用 DSPUtil ,我能做到这一点。我打一点点与它试图音频样本的不同组合,看看有什么结果。可视化数据,我只是把原始音频的数字到Excel和使用该号码创建一个图表。峰值可见,但我真的不知道如何帮助我。我有这些问题:

in my previous question on finding a reference audio sample in a bigger audio sample, it was proposed, that I should use convolution.
Using DSPUtil, I was able to do this. I played a little with it and tried different combinations of audio samples, to see what the result was. To visualize the data, I just dumped the raw audio as numbers to Excel and created a chart using this numbers. A peak is visible, but I don't really know how this helps me. I have these problems:

  • 在我不知道,如何推断从峰值的位置,原始音频样本中本场比赛的首发位置。
  • 在我不知道,我应该如何应用此音频的连续流,从而为基准音频采样时,我可以作出反应,尽快。
  • 我不明白,为什么画面2和画面4(见下文),相差这么多,但是,无论是重新present音频样本卷积自己...

任何帮助是非常AP preciated。

Any help is highly appreciated.

下面的图片是使用Excel进行分析的结果是:

The following pictures are the result of the analysis using Excel:

  1. 与基准声音(提示音)临近年底更长的音频采样:
  2. 的蜂鸣卷积本身:
  3. 在没有卷积哔哔更长的音频采样:
  4. 3点卷积本身的较长音频采样:
  1. A longer audio sample with the reference audio (a beep) near the end:
  2. The beep convolved with itself:
  3. A longer audio sample without the beep convolved with the beep:
  4. The longer audio sample of point 3 convolved with itself:

更新及解决方法:
由于汉广泛的帮助,我能够实现我的目标。
当我推出我自己的执行缓慢而不FFT,我发现 alglib 它提供了一个快速的实现。 有一个基本假设我的问题:所述音频样本被完全包含在另一个
所以,下面的code返回偏移量在样品中的两个音频采样的更大的归一化互相关值在该偏移。 1表示完全相关,0表示完全没有关联,-1表示完全负相关:

UPDATE and solution:
Thanks to the extensive help of Han, I was able to achieve my goal.
After I rolled my own slow implementation without FFT, I found alglib which provides a fast implementation. There is one basic assumption to my problem: One of the audio samples is contained completely within the other.
So, the following code returns the offset in samples in the larger of the two audio samples and the normalized cross-correlation value at that offset. 1 means complete correlation, 0 means no correlation at all and -1 means complete negative correlation:

private void CalcCrossCorrelation(IEnumerable<double> data1, 
                                  IEnumerable<double> data2, 
                                  out int offset, 
                                  out double maximumNormalizedCrossCorrelation)
{
    var data1Array = data1.ToArray();
    var data2Array = data2.ToArray();
    double[] result;
    alglib.corrr1d(data1Array, data1Array.Length, 
                   data2Array, data2Array.Length, out result);

    var max = double.MinValue;
    var index = 0;
    var i = 0;
    // Find the maximum cross correlation value and its index
    foreach (var d in result)
    {
        if (d > max)
        {
            index = i;
            max = d;
        }
        ++i;
    }
    // if the index is bigger than the length of the first array, it has to be
    // interpreted as a negative index
    if (index >= data1Array.Length)
    {
        index *= -1;
    }

    var matchingData1 = data1;
    var matchingData2 = data2;
    var biggerSequenceCount = Math.Max(data1Array.Length, data2Array.Length);
    var smallerSequenceCount = Math.Min(data1Array.Length, data2Array.Length);
    offset = index;
    if (index > 0)
        matchingData1 = data1.Skip(offset).Take(smallerSequenceCount).ToList();
    else if (index < 0)
    {
        offset = biggerSequenceCount + smallerSequenceCount + index;
        matchingData2 = data2.Skip(offset).Take(smallerSequenceCount).ToList();
        matchingData1 = data1.Take(smallerSequenceCount).ToList();
    }
    var mx = matchingData1.Average();
    var my = matchingData2.Average();
    var denom1 = Math.Sqrt(matchingData1.Sum(x => (x - mx) * (x - mx)));
    var denom2 = Math.Sqrt(matchingData2.Sum(y => (y - my) * (y - my)));
    maximumNormalizedCrossCorrelation = max / (denom1 * denom2);
}

赏金:
无需新的答案!我开始赏金,奖励它来汉为他的这个问题继续努力!

BOUNTY:
No new answers required! I started the bounty to award it to Han for his continued effort with this question!

推荐答案

在这里,我们去的赏金:)

Here we go for the bounty :)

要找到一个更大的音频片断特定的参考信号,您需要使用互相关算法。基本公式可以在此维基百科的文章被发现。

To find a particular reference signal in a larger audio fragment, you need to use a cross-correlation algorithm. The basic formulae can be found in this Wikipedia article.

互相关是一个过程,其中2个信号进行比较。这是通过相乘两个信号和求和的结果所有样品进行。然后所述信号中的一个被移动(通常由1个样本),并且计算被重复。如果你试图想象这个非常简单的信号,如单脉冲(如1个样本具有一定的价值,而其余样品均为零),或纯正弦波,你会看到互相关的结果确实的量度为两个信号多少是完全一样的,它们之间的延迟。另一篇文章,提供更多的有识之士都可以在这里找到

Cross-correlation is a process by which 2 signals are compared. This is done by multiplying both signals and summing the results for all samples. Then one of the signals is shifted (usually by 1 sample), and the calculation is repeated. If you try to visualize this for very simple signals such as a single impulse (e.g. 1 sample has a certain value while the remaining samples are zero), or a pure sine wave, you will see that the result of the cross-correlation is indeed a measure for for how much both signals are alike and the delay between them. Another article that may provide more insight can be found here.

由保罗·伯克文章还包含源$ C ​​$下一个简单的时域实施。请注意,本文是为一个通用的信号写入。音响有特殊的属性,它的长期平均usualy 0。这意味着,在保罗Bourkes公式中使用的平均值(MX和我)可以被排除在外。 也有基于所述FFT而得到的互相关的快速实现(见 ALGLIB 的)。

This article by Paul Bourke also contains source code for a straightforward time-domain implementation. Note that the article is written for a general signal. Audio has the special property that the long-time average is usualy 0. This means that the averages used in Paul Bourkes formula (mx and my) can be left out. There are also fast implementations of the cross-correlation based on the FFT (see ALGLIB).

的相关性的(最大)值取决于在音频信号的样本值。在保罗·伯克的算法,但最高缩小到1.0。另外,在信号中的一个完全包含另一个信号内的情况下,最大值达到1。在更一般的情况下,最大将是低和一个阈值将不得不被确定,以决定该信号是否足够相似。

The (maximum) value of the correlation depends on the sample values in the audio signals. In Paul Bourke's algorithm however the maximum is scaled to 1.0. In cases where one of the signals is contained entirely within another signal, the maximum value will reach 1. In the more general case the maximum will be lower and a threshold value will have to be determined to decide whether the signals are sufficiently alike.

这篇关于使用卷积找到一个参考音频样本在声音的连续流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆