使用fftw和窗口函数生成正确的频谱图 [英] generating correct spectrogram using fftw and window function

查看:2491
本文介绍了使用fftw和窗口函数生成正确的频谱图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于一个项目,我需要能够从.WAV文件生成一个谱图。我已阅读以下内容:

For a project I need to be able to generate a spectrogram from a .WAV file. I've read the following should be done:


  1. 获取N(变换大小)示例

  2. 应用视窗功能

  3. 使用示例进行快速傅里叶变换

  4. 规范化输出

  5. 生成频谱图

  1. Get N (transform size) samples
  2. Apply a window function
  3. Do a Fast Fourier Transform using the samples
  4. Normalise the output
  5. Generate spectrogram

下面您会看到两个使用 hanning 窗口函数的10000 Hz正弦波的频谱图。在左侧,您会看到 audacity 生成的频谱图,右侧显示的是我的版本。正如你可以看到我的版本有很多更多的线/噪音。这种泄漏在不同的箱子里吗?我将如何得到一个清晰的图像像一个大胆生成。我应该做一些后处理吗?我还没有做任何规范化,因为不完全明白如何这样做。

On the image below you see two spectrograms of a 10000 Hz sine wave both using the hanning window function. On the left you see a spectrogram generated by audacity and on the right my version. As you can see my version has a lot more lines/noise. Is this leakage in different bins? How would I get a clear image like the one audacity generates. Should I do some post-processing? I have not yet done any normalisation because do not fully understand how to do so.

>

更新

我发现这个教程解释如何在c ++中生成一个谱图。我编译了源代码,看看我能找到什么差异。

I found this tutorial explaining how to generate a spectrogram in c++. I compiled the source to see what differences I could find.

我的数学是非常生锈的,诚实的,所以我不知道这里的规范化:

My math is very rusty to be honest so I'm not sure what the normalisation does here:

    for(i = 0; i < half; i++){
        out[i][0] *= (2./transform_size);
        out[i][6] *= (2./transform_size);
        processed[i] = out[i][0]*out[i][0] + out[i][7]*out[i][8];
        //sets values between 0 and 1?
        processed[i] =10. * (log (processed[i] + 1e-6)/log(10)) /-60.;
    }

这样做后我得到了这张图片(btw我已经反转的颜色) :

after doing this I got this image (btw I've inverted the colors):

>

然后我看看我的声音库和教程提供的输入样本的差异。我的方式更高,所以我手动归一化是除以系数32767.9。然后我去这个图像看起来很好我想。但是除以这个数似乎是错误的。我想看到一个不同的解决方案。

I then took a look at difference of the input samples provided by my sound library and the one of the tutorial. Mine were way higher so I manually normalised is by dividing it by the factor 32767.9. I then go this image which looks pretty ok I think. But dividing it by this number seems wrong. And I would like to see a different solution.

这是完整的相关源代码。

Here is the full relevant source code.

void Spectrogram::process(){
    int i;
    int transform_size = 1024;
    int half = transform_size/2;
    int step_size = transform_size/2;
    double in[transform_size];
    double processed[half];
    fftw_complex *out;
    fftw_plan p;

    out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * transform_size);


    for(int x=0; x < wavFile->getSamples()/step_size; x++){

        int j = 0;
        for(i = step_size*x; i < (x * step_size) + transform_size - 1; i++, j++){
            in[j] = wavFile->getSample(i)/32767.9;
        }

        //apply window function
        for(i = 0; i < transform_size; i++){
            in[i] *= windowHanning(i, transform_size);
//            in[i] *= windowBlackmanHarris(i, transform_size);
        }

        p = fftw_plan_dft_r2c_1d(transform_size, in, out, FFTW_ESTIMATE);

        fftw_execute(p); /* repeat as needed */

        for(i = 0; i < half; i++){
            out[i][0] *= (2./transform_size);
            out[i][11] *= (2./transform_size);
            processed[i] = out[i][0]*out[i][0] + out[i][12]*out[i][13];
            processed[i] =10. * (log (processed[i] + 1e-6)/log(10)) /-60.;
        }

        for (i = 0; i < half; i++){
            if(processed[i] > 0.99)
                processed[i] = 1;
            In->setPixel(x,(half-1)-i,processed[i]*255);
        }


    }


    fftw_destroy_plan(p);
    fftw_free(out);
}


推荐答案


  1. 你认为这行是什么意思? processed [i] = out [i] [0] * out [i] [0] + out [i] [12] * out [i]这不正确:fftw_complex typedef double fftw_complex [2] ,所以你只有 out [i] [0] out [i] [1] ,其中第一个是实数,第二个是该bin的结果的虚部。如果数组在内存中是连续的(它是),则 out [i] [12] 很可能与 out [i + 6 ] [0] 等。

  1. What do you think this line does? processed[i] = out[i][0]*out[i][0] + out[i][12]*out[i][13] Likely that is incorrect: fftw_complex is typedef double fftw_complex[2], so you only have out[i][0] and out[i][1], where the first is the real and the second the imaginary part of the result for that bin. If the array is contiguous in memory (which it is), then out[i][12] is likely the same as out[i+6][0] and so forth. Some of these will go past the end of the array, adding random values.

您的窗口函数是否正确?打印出每个i的窗口Hanning(i,transform_size),并与参考版本(例如numpy.hanning或matlab等效值)进行比较。这是最可能的原因,你看到的看起来像一个坏窗口函数,种类。

Is your window function correct? Print out windowHanning(i, transform_size) for every i and compare with a reference version (for example numpy.hanning or the matlab equivalent). This is the most likely cause, what you see looks like a bad window function, kind of.

打印输出处理,并与参考版本相同的输入,当然你必须打印输入和重新格式化它进入pylab / matlab等)。然而,-60和1e-6是你不想要的欺骗因素,同样的效果最好以不同的方式做。计算如下:

Print out processed, and compare with a reference version (given the same input, of course you'd have to print the input and reformat it to feed into pylab/matlab etc). However, the -60 and 1e-6 are fudge factors which you don't want, the same effect is better done in a different way. Calculate like this:

power_in_db[i] = 10 * log(out[i][0]*out[i][0] + out[i][1]*out[i][1])/log(10)


  • 为相同的i但是对于所有x(水平线)打印 power_in_db [i] 的值。它们是否大致相同?

  • Print out the values of power_in_db[i] for the same i but for all x (a horizontal line). Are they approximately the same?

    如果到目前为止的一切都很好,剩下的嫌疑人将设置像素值。非常明确的剪辑到范围,缩放和舍入。

    If everything so far is good, the remaining suspect is setting the pixel values. Be very explicit about clipping to range, scaling and rounding.

    int pixel_value = (int)round( 255 * (power_in_db[i] - min_db) / (max_db - min_db) );
    if (pixel_value < 0) { pixel_value = 0; }
    if (pixel_value > 255) { pixel_value = 255; }
    


  • 在水平线上,并与您的pgm中的灰度值(手动,使用photoshop或gimp或类似的colorpicker)进行比较。

    Here, again, print out the values in a horizontal line, and compare with the grayscale values in your pgm (by hand, using the colorpicker in photoshop or gimp or similar).

    已经验证了从头到尾的一切,可能发现了错误。

    At this point, you will have validated everything from end to end, and likely found the bug.

    这篇关于使用fftw和窗口函数生成正确的频谱图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆