从WAV文件的解码DTMF [英] Decoding DTMF from a WAV file

查看:1744
本文介绍了从WAV文件的解码DTMF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从我的更早问题继,我的目标是检测从C#WAV文件DTMF音调。不过,我真的在努力理解如何可以做到这一点。

Following on from my earlier question, my goal is to detect DTMF tones in a WAV file from C#. However, I'm really struggling to understand how this can be done.

据我所知,DTMF使用频率的组合,并且可以使用Goertzel算法...不知何故。我抓起戈泽尔代码片段,我已经试过搡.wav文件进去(用n音讯读取该文件,这是一个单8KHz的16位PCM WAV):

I understand the DTMF uses a combination of frequencies, and a Goertzel algorithm can be used ... somehow. I've grabbed a Goertzel code snippet and I've tried shoving a .WAV file into it (using NAudio to read the file, which is a 8KHz mono 16-bit PCM WAV):

 using (WaveFileReader reader = new WaveFileReader(@"dtmftest_w.wav"))
  {
      byte[] buffer = new byte[reader.Length];

      int read = reader.Read(buffer, 0, buffer.Length);
      short[] sampleBuffer = new short[read/2];
      Buffer.BlockCopy(buffer, 0, sampleBuffer, 0, read/2);
      Console.WriteLine(CalculateGoertzel(sampleBuffer,8000,16));                 
   }

 public static double CalculateGoertzel(short[] sample, double frequency, int samplerate)
   {
      double Skn, Skn1, Skn2;
      Skn = Skn1 = Skn2 = 0;
      for (int i = 0; i < sample.Length; i++)
         {
            Skn2 = Skn1;
            Skn1 = Skn;
            Skn = 2 * Math.Cos(2 * Math.PI * frequency / samplerate) * Skn1 - Skn2 + sample[i];
         }
      double WNk = Math.Exp(-2 * Math.PI * frequency / samplerate);
      return 20 * Math.Log10(Math.Abs((Skn - WNk * Skn1)));
    }



我知道我在做什么是错的:我认为我应该遍历通过缓冲,只计算戈泽尔值在时间的一小块 - 这是正确的。

I know what I'm doing is wrong: I assume that I should iterate through the buffer, and only calculate the Goertzel value for a small chunk at a time - is this correct?

其次,我真的不明白的输出戈泽尔方法告诉我:我得到一个双(例如: 210.985812 )回来了,但我不知道这等同于一个DTMF​​音的存在和价值音频文件。

Secondly, I don't really understand what the output of the Goertzel method is telling me: I get a double (example: 210.985812) returned, but I don't know to equate that to the presence and value of a DTMF tone in the audio file.

我到处去寻找答案,其中包括的这个回答;不幸的是,代码这里似乎并没有工作(如在网站上的评论中指出)。这里是 TAPIEx 提供了一个商业库;我试过他们的评价库,它正是我需要的 - 但他们不回复电子邮件,这使我警惕实际购买他们的产品。

I've searched everywhere for an answer, including the libraries referenced in this answer; unfortunately, the code here doesn't appear to work (as noted in the comments on the site). There is a commercial library offered by TAPIEx; I've tried their evaluation library and it does exactly what I need - but they're not responding to emails, which makes me wary about actually purchasing their product.

我M清楚知道,我在寻找答案时,也许我不知道确切的问题,但最终我需要的是一种方法找到一个.wav文件DTMF音调。我是在正确的路线,如果没有,任何人都可以点我在正确的方向

I'm very conscious that I'm looking for an answer when perhaps I don't know the exact question, but ultimately all I need is a way to find DTMF tones in a .WAV file. Am I on the right lines, and if not, can anyone point me in the right direction?

编辑:使用@Abbondanza的代码作为基础,并在(可能是根本错误的)假设我需要在音频文件的点滴小部分,我现在有这个(很粗糙,证据的概念只)代码:

Using @Abbondanza 's code as a basis, and on the (probably fundamentally wrong) assumption that I need to drip-feed small sections of the audio file in, I now have this (very rough, proof-of-concept only) code:

const short sampleSize = 160;

using (WaveFileReader reader = new WaveFileReader(@"\\mac\home\dtmftest.wav"))
        {           
            byte[] buffer = new byte[reader.Length];

            reader.Read(buffer, 0, buffer.Length);

            int bufferPos = 0;

            while (bufferPos < buffer.Length-(sampleSize*2))
            {
                short[] sampleBuffer = new short[sampleSize];
                Buffer.BlockCopy(buffer, bufferPos, sampleBuffer, 0, sampleSize*2);


                var frequencies = new[] {697.0, 770.0, 852.0, 941.0, 1209.0, 1336.0, 1477.0};

                var powers = frequencies.Select(f => new
                {
                    Frequency = f,
                   Power = CalculateGoertzel(sampleBuffer, f, 8000)              
                });

                const double AdjustmentFactor = 1.05;
                var adjustedMeanPower = AdjustmentFactor*powers.Average(result => result.Power);

                var sortedPowers = powers.OrderByDescending(result => result.Power);
                var highestPowers = sortedPowers.Take(2).ToList();

                float seconds = bufferPos / (float)16000;

                if (highestPowers.All(result => result.Power > adjustedMeanPower))
                {
                    // Use highestPowers[0].Frequency and highestPowers[1].Frequency to 
                    // classify the detected DTMF tone.

                    switch (Convert.ToInt32(highestPowers[0].Frequency))
                    {
                        case 1209:
                            switch (Convert.ToInt32(highestPowers[1].Frequency))
                            {
                                case 697:
                                    Console.WriteLine("1 pressed at " + bufferPos + " (" + seconds + "s)");
                                    break;
                                case 770:
                                    Console.WriteLine("4 pressed at " + bufferPos + " (" + seconds + "s)");
                                    break;
                                case 852:
                                    Console.WriteLine("7 pressed at " + bufferPos + " (" + seconds + "s)");
                                    break;
                                case 941:
                                    Console.WriteLine("* pressed at " + bufferPos);
                                    break;
                            }
                            break;
                        case 1336:
                            switch (Convert.ToInt32(highestPowers[1].Frequency))
                            {
                                case 697:
                                    Console.WriteLine("2 pressed at " + bufferPos + " (" + seconds + "s)");
                                    break;
                                case 770:
                                    Console.WriteLine("5 pressed at " + bufferPos + " (" + seconds + "s)");
                                    break;
                                case 852:
                                    Console.WriteLine("8 pressed at " + bufferPos + " (" + seconds + "s)");
                                    break;
                                case 941:
                                    Console.WriteLine("0 pressed at " + bufferPos + " (" + seconds + "s)");
                                    break;
                            }
                            break;
                        case 1477:
                            switch (Convert.ToInt32(highestPowers[1].Frequency))
                            {
                                case 697:
                                    Console.WriteLine("3 pressed at " + bufferPos + " (" + seconds + "s)");
                                    break;
                                case 770:
                                    Console.WriteLine("6 pressed at " + bufferPos + " (" + seconds + "s)");
                                    break;
                                case 852:
                                    Console.WriteLine("9 pressed at " + bufferPos + " (" + seconds + "s)");
                                    break;
                                case 941:
                                    Console.WriteLine("# pressed at " + bufferPos + " (" + seconds + "s)");
                                    break;
                            }
                            break;
                    }
                }
                else
                {
                    Console.WriteLine("No DTMF at " + bufferPos + " (" + seconds + "s)");
                }
                bufferPos = bufferPos + (sampleSize*2);
            }

这是在Audacity的观察样本文件;我在被按下的按键DTMF加 -

This is the sample file as viewed in Audacity; I've added in the DTMF keypresses that were pressed-

和...它的几乎的工作。从上面的文件,我看不到任何的DTMF,直到几乎正好3秒,但是,我的代码报告:

and ... it almost works. From the file above, I shouldn't see any DTMF until almost exactly 3 seconds in, however, my code reports:

9 pressed at 1920 (0.12s)
1 pressed at 2880 (0.18s)
* pressed at 3200
1 pressed at 5120 (0.32s)
1 pressed at 5440 (0.34s)
7 pressed at 5760 (0.36s)
7 pressed at 6080 (0.38s)
7 pressed at 6720 (0.42s)
5 pressed at 7040 (0.44s)
7 pressed at 7360 (0.46s)
7 pressed at 7680 (0.48s)
1 pressed at 8000 (0.5s)
7 pressed at 8320 (0.52s)

...,直到获得3秒,然后它开始安定下来了正确的答案:即 1 被压:

... until it gets to 3 seconds, and THEN it starts to settle down to the correct answer: that 1 was pressed:

7 pressed at 40000 (2.5s)
# pressed at 43840 (2.74s)
No DTMF at 44800 (2.8s)
1 pressed at 45120 (2.82s)
1 pressed at 45440 (2.84s)
1 pressed at 46080 (2.88s)
1 pressed at 46720 (2.92s)
4 pressed at 47040 (2.94s)
1 pressed at 47360 (2.96s)
1 pressed at 47680 (2.98s)
1 pressed at 48000 (3s)
1 pressed at 48960 (3.06s)
4 pressed at 49600 (3.1s)
1 pressed at 49920 (3.12s)
1 pressed at 50560 (3.16s)
1 pressed at 51520 (3.22s)
1 pressed at 52160 (3.26s)
4 pressed at 52480 (3.28s)

如果我碰到了 AdjustmentFactor 超越1.2,我得到很少的检测都没有。

If I bump up the AdjustmentFactor beyond 1.2, I get very little detection at all.

我感觉到,我几乎没有,但任何人都可以看到它是什么,我缺少什么?

I sense that I'm almost there, but can anyone see what it is I'm missing?

EDIT2:测试文件上面可这里。在 adjustedMeanPower 在上面的例子中是 47.6660450354638 和职权是:

The test file above is available here. The adjustedMeanPower in the example above is 47.6660450354638, and the powers are:

推荐答案

CalculateGoertzel() 返回的功率的提供的样品中所选择的频率。

CalculateGoertzel() returns the power of the selected frequency within the provided sample.

计算这个权力每个DTMF​​频率(697,770,852,941,1209,1336,1477和赫兹),由此产生的权力进行排序并挑选最高的两项。如果两者都高于某个阈值,那么DTMF音频已经被检测到。

Calculate this power for each of the DTMF frequencies (697, 770, 852, 941, 1209, 1336, and 1477 Hz), sort the resulting powers and pick the highest two. If both are above a certain threshold then a DTMF tone has been detected.

您的门槛用什么取决于信号来样的信噪比(SNR)。因为一开始就应该足以计算所有Goerzel值的平均值,由要素(例如2或3),并检查两个最高Goerzel值高于该值。

What you use as threshold depends on the signal to noise ratio (SNR) of your sample. For a start it should be sufficient to calculate the mean of all Goerzel values, multiply the mean by a factor (e.g. 2 or 3), and check if the two highest Goerzel values are above that value.

下面的代码片段来表达我的意思是在一个更正式的方式:

Here is a code snippet to express what I mean in a more formal way:

var frequencies = new[] {697.0, 770.0, 852.0, 941.0, 1209.0, 1336.0, 1477.0};

var powers = frequencies.Select(f => new
{
    Frequency = f,
    Power = CalculateGoerzel(sample, f, samplerate)
});

const double AdjustmentFactor = 1.0;
var adjustedMeanPower = AdjustmentFactor * powers.Average(result => result.Power);

var sortedPowers = powers.OrderByDescending(result => result.Power);
var highestPowers = sortedPowers.Take(2).ToList();

if (highestPowers.All(result => result.Power > adjustedMeanPower))
{
    // Use highestPowers[0].Frequency and highestPowers[1].Frequency to 
    // classify the detected DTMF tone.
}

开始与 AdjustmentFactor 1.0 。如果您从您的测试数据得到误报(即你检测样品那里不应该有任何DTMF音DTMF音),不断增加,直到误报停止。

Start with an AdjustmentFactor of 1.0. If you get false positives from your test data (i.e. you detect DTMF tones in samples where there shouldn't be any DTMF tones), keep increasing it until the false positives stop.

更新#1

我想你的波形文件的代码,并调整了一些东西:

I tried your code on the wave file and adjusted a few things:

我物化枚举戈泽尔计算后(对于性能很重要):

I materialized the enumerable after the Goertzel calculation (important for performance):

var powers = frequencies.Select(f => new
{
    Frequency = f,
    Power = CalculateGoertzel(sampleBuffer, f, 8000)
// Materialize enumerable to avoid multiple calculations.
}).ToList();



我没有使用调整后的平均阈值。我只是用 100.0 为阈值:

if (highestPowers.All(result => result.Power > 100.0))
{
     ...
}

我一倍样本大小(我相信你使用 160 ):

I doubled the sample size (I believe you used 160):

int sampleSize = 160 * 2;



我固定的DTMF分类。我用嵌套的字典捕获的所有的可能情况:

var phoneKeyOf = new Dictionary<int, Dictionary<int, string>>
{
    {1209, new Dictionary<int, string> {{1477, "?"}, {1336, "?"}, {1209, "?"}, {941, "*"}, {852, "7"}, {770, "4"}, {697, "1"}}},
    {1336, new Dictionary<int, string> {{1477, "?"}, {1336, "?"}, {1209, "?"}, {941, "0"}, {852, "8"}, {770, "5"}, {697, "2"}}},
    {1477, new Dictionary<int, string> {{1477, "?"}, {1336, "?"}, {1209, "?"}, {941, "#"}, {852, "9"}, {770, "6"}, {697, "3"}}},
    { 941, new Dictionary<int, string> {{1477, "#"}, {1336, "0"}, {1209, "*"}, {941, "?"}, {852, "?"}, {770, "?"}, {697, "?"}}},
    { 852, new Dictionary<int, string> {{1477, "9"}, {1336, "8"}, {1209, "7"}, {941, "?"}, {852, "?"}, {770, "?"}, {697, "?"}}},
    { 770, new Dictionary<int, string> {{1477, "6"}, {1336, "5"}, {1209, "4"}, {941, "?"}, {852, "?"}, {770, "?"}, {697, "?"}}},
    { 697, new Dictionary<int, string> {{1477, "3"}, {1336, "2"}, {1209, "1"}, {941, "?"}, {852, "?"}, {770, "?"}, {697, "?"}}}
}

电话键,然后用检索?:

The phone key is then retrieved with:

var key = phoneKeyOf[(int)highestPowers[0].Frequency][(int)highestPowers[1].Frequency];



结果并不完美,但有些可靠。

The results are not perfect, but somewhat reliable.

更新#2

我想我想通了这个问题,但能T现在尝试一下自己。您不能直接传递目标frequenzy到 CalculateGoertzel()。它必须被归为中心在DFT仓。当计算的权力试试这个方法:

I think I figured out the problem, but can't try it out myself right now. You cannot pass the target frequenzy directly to CalculateGoertzel(). It has to be normalized to be centered over the DFT bins. When calculating the powers try this approach:

var powers = frequencies.Select(f => new
{
    Frequency = f,
    // Pass normalized frequenzy
    Power = CalculateGoertzel(sampleBuffer, Math.Round(f*sampleSize/8000.0), 8000)
}).ToList();



另外,你必须使用 205 的采样大小,以最小化的错误。

Also you have to use 205 as sampleSize in order the minimize the error.

更新#3

我重新写使用n音讯的 ISampleProvider 界面,返回规范化样品原型值(浮动 S IN范围为[-1.0; 1.0])。此外,我重新写了 CalculateGoertzel()从头开始。它仍然不是性能最优化,但给人的频率之间的多,更明显权力差异。还有的没有的更多误报当我运行它的测试数据。我强烈建议你看看吧: http://pastebin.com/serxw5nG

I re-wrote the prototype to use NAudio's ISampleProvider interface, which returns normalized sample values (floats in range [-1.0; 1.0]). Also I re-wrote CalculateGoertzel() from scratch. It's still not performance optimized, but gives much, much more pronounced power differences between frequencies. There are no more false positives when I run it your test data. I highly recommend you take a look at it: http://pastebin.com/serxw5nG

更新#4

我创建了一个<一个HREF =https://github.com/bert2/DtmfDetection相对=nofollow> GitHub的项目和的双包的NuGet 的检测现场(拍摄的)音频和预先录制的音频文件DTMF音频。

I created a GitHub project and two NuGet packages to detect DTMF tones in live (captured) audio and pre-recorded audio files.

这篇关于从WAV文件的解码DTMF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆