如何过滤FFT数据(音频可视化)? [英] How to filter FFT data (for audio visualisation)?

查看:3101
本文介绍了如何过滤FFT数据(音频可视化)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在寻找这个网络音频API演示的一部分< A HREF =htt​​p://chimera.labs.oreilly.com/books/1234000001552/ch05.html#s05_3>本好书

I was looking at this Web Audio API demo, part of this nice book

如果你看一下演示中,FFT高峰平稳回落。我试图用微量库在Java模式做同样拥有加工。我看这是怎么用的<一个网络音频API完成href=\"https://chromium.googlesource.com/chromium/blink/+/e27c9b6ef3132802bc9a8ea2df0227dd0effbc7b/Source/modules/webaudio/RealtimeAnalyser.cpp#L200\">doFFTAnalysis()方法并试图用微量复制此。我也试图端口ABS()是如何工作的复杂类型:

If you look at the demo, the fft peaks fall smoothly. I'm trying to do same with Processing in Java mode using the minim library. I've looked at how this is done with the web audio api in the doFFTAnalysis() method and tried to replicate this with minim. I also tried to port how abs() works with the complex type:

/ 26.2.7/3 abs(__z):  Returns the magnitude of __z.
00565   template<typename _Tp>
00566     inline _Tp
00567     __complex_abs(const complex<_Tp>& __z)
00568     {
00569       _Tp __x = __z.real();
00570       _Tp __y = __z.imag();
00571       const _Tp __s = std::max(abs(__x), abs(__y));
00572       if (__s == _Tp())  // well ...
00573         return __s;
00574       __x /= __s; 
00575       __y /= __s;
00576       return __s * sqrt(__x * __x + __y * __y);
00577     }
00578 

目前,我正在做一个快速原型使用处理(一个Java框架/库)。我的code是这样的:

I'm currently doing a quick prototype using Processing(a java framework/library). My code looks like this:

import ddf.minim.*;
import ddf.minim.analysis.*;

private int blockSize = 512;
private Minim minim;
private AudioInput in;
private FFT         mfft;
private float[]    time = new float[blockSize];//time domain
private float[]    real = new float[blockSize];
private float[]    imag = new float[blockSize];
private float[]    freq = new float[blockSize];//smoothed freq. domain

public void setup() {
  minim = new Minim(this);
  in = minim.getLineIn(Minim.STEREO, blockSize);
  mfft = new FFT( in.bufferSize(), in.sampleRate() );
}
public void draw() {
  background(255);
  for (int i = 0; i < blockSize; i++) time[i] = in.left.get(i);
  mfft.forward( time);
  real = mfft.getSpectrumReal();
  imag = mfft.getSpectrumImaginary();

  final float magnitudeScale = 1.0 / mfft.specSize();
  final float k = (float)mouseX/width;

  for (int i = 0; i < blockSize; i++)
  {      
      float creal = real[i];
      float cimag = imag[i];
      float s = Math.max(creal,cimag);
      creal /= s;
      cimag /= s; 
      float absComplex = (float)(s * Math.sqrt(creal*creal + cimag*cimag));
      float scalarMagnitude = absComplex * magnitudeScale;        
      freq[i] = (k * mfft.getBand(i) + (1 - k) * scalarMagnitude);

      line( i, height, i, height - freq[i]*8 );
  }
  fill(0);
  text("smoothing: " + k,10,10);
}

我没有得到错误,这是很好的,但我没有得到预期的行为,这是坏的。
我预计峰平滑回落时速度较慢(k)是接近1,但据我可以告诉我的code仅
缩放乐队。

I'm not getting errors, which is good, but I'm not getting the expected behaviour which is bad. I expected the peaks to fall slower when smoothing(k) is close 1, but as far as I can tell my code only scales the bands.

不幸的是数学和声音是不是我的强项,所以我在黑暗中刺伤。
我怎样才能复制从网络音频API演示漂亮的可视化?

Unfortunately math and sound isn't my strong point, so I'm stabbing in the dark. How can I replicate the nice visualisation from the Web Audio API demo ?

我会想说这可能是语言无关,而是使用JavaScript例如将不适用:)。不过,我很高兴去尝试任何其他Java库,做FFT分析。

I would be tempted to say this can be language agnostic, but using javascript for example wouldn't apply :). However, I'm happy to try any other java library that does FFT analysis.

更新

我有一个简单的解决方案,用于平滑(连续,如果当前FFT波段不高于每减少previous FFT乐队值:

I've got a simple solution for smoothing (continuously diminish values of each previous fft band if the current fft band is not higher:

import ddf.minim.analysis.*;
import ddf.minim.*;

Minim       minim;
AudioInput  in;
FFT         fft;

float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
int specSize;
void setup(){
  size(640, 360, P3D);
  minim = new Minim(this);
  in = minim.getLineIn(Minim.STEREO, 512);
  fft = new FFT(in.bufferSize(), in.sampleRate());
  specSize = fft.specSize();
  fftSmooth = new float[specSize];
  fftReal   = new float[specSize];
  colorMode(HSB,specSize,100,100);
}

void draw(){
  background(0);
  stroke(255);

  fft.forward( in.left);
  fftReal = fft.getSpectrumReal();
  fftImag = fft.getSpectrumImaginary();
  for(int i = 0; i < specSize; i++)
  {
    float band = fft.getBand(i);

    fftSmooth[i] *= smoothing;
    if(fftSmooth[i] < band) fftSmooth[i] = band;
    stroke(i,100,50);
    line( i, height, i, height - fftSmooth[i]*8 );
    stroke(i,100,100);
    line( i, height, i, height - band*8 );


  }
  text("smoothing: " + (int)(smoothing*100),10,10);
}
void keyPressed(){
  float inc = 0.01;
  if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
  if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
}

的褪色曲线是平滑一个和完全饱和的一个是活的。

The faded graph is the smoothed one and the fully saturated one is the live one.

不过,我还是失去了一些东西,相较于网络音频API演示:

I am however still missing something, in comparison to the Web Audio API demo:

我觉得网络音频API可能考虑到中期和更高的频率将需要进行调整以接近我们所认为的,但我不知道如何解决的。

I think the Web Audio API might take into account that the medium and higher frequencies will need to be scaled to be closer to what we perceive, but I'm not sure how to tackle that.

我是想了解更多关于RealtimeAnalyser类如何做到这一点的WebAudioAPI,但似乎<一个href=\"https://github.com/WebKit/webkit/blob/master/Source/WebCore/platform/audio/FFTFrame.cpp\">FFTFrame类的 doFFT 方法可能会做对数缩放。我还没有想出doFFT是如何工作的呢。

I was trying to read more on how the RealtimeAnalyser class does this for the WebAudioAPI, but it seems FFTFrame class's doFFT method might do the logarithmic scaling. I haven't figured out how doFFT works yet.

如何扩展与对数刻度原始FFT图表来解释的看法?
我的目标是做一个体面的期待可视化和我的猜测是,我将需要:

How can I scale a raw FFT graph with a logarithmic scale to account for perception ? My goal is to do a decent looking visualisation and my guess is i will need to:


  • 流畅值,否则元素将动画快速/颠簸

  • 比例的FFT桶/乐队获得中/高频率,更好的数据

  • 地图过程FFT值的视觉元素(找最大值/边界)

  • smooth values, otherwise elements will animate to fast/twitchy
  • scale the FFT bins/bands to get better data for medium/high frequencies
  • map process FFT values to visual elements (find the maximum values/bounds)

我如何能做到这一点任何提示?

Any hints on how I can achieve this ?

更新2

我猜这部分不平滑和缩放我的网络音频API中后我:
    //规范化,从而0dBFS相对比的0dBFS相对应的输入正弦波注册(FFT撤销缩放因子)。
    常量双magnitudeScale = 1.0 / DefaultFFTSize;

I'm guessing this part does the smoothing and scaling I'm after in the Web Audio API: // Normalize so than an input sine wave at 0dBfs registers as 0dBfs (undo FFT scaling factor). const double magnitudeScale = 1.0 / DefaultFFTSize;

// A value of 0 does no averaging with the previous result.  Larger values produce slower, but smoother changes.
double k = m_smoothingTimeConstant;
k = max(0.0, k);
k = min(1.0, k);    

// Convert the analysis data from complex to magnitude and average with the previous result.
float* destination = magnitudeBuffer().data();
size_t n = magnitudeBuffer().size();
for (size_t i = 0; i < n; ++i) {
    Complex c(realP[i], imagP[i]);
    double scalarMagnitude = abs(c) * magnitudeScale;        
    destination[i] = float(k * destination[i] + (1 - k) * scalarMagnitude);
}

看来缩放操作是通过复数值的绝对值进行。 这篇文章分在同一个方向。我已经使用微量使用,并使用不同的窗函数的复数的ABS尝试,但它仍然看起来并不像什么我的目标(即网络音频API演示):

It seems the scaling is done by taking the absolute of the complex value. This post points in the same direction. I've tried using the abs of the complex number using Minim and using various window functions but it still doesn't look like what I'm aiming for(the Web Audio API demo):

import ddf.minim.analysis.*;
import ddf.minim.*;

Minim       minim;
AudioInput  in;
FFT         fft;

float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
int specSize;

WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};
String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};
int windex = 0;

void setup(){
  size(640, 360, P3D);
  minim = new Minim(this);
  in = minim.getLineIn(Minim.STEREO, 512);
  fft = new FFT(in.bufferSize(), in.sampleRate());
  fft.window(window[windex]);
  specSize = fft.specSize();
  fftSmooth = new float[specSize];
  fftReal   = new float[specSize];
  colorMode(HSB,specSize,100,100);
}

void draw(){
  background(0);
  stroke(255);

  fft.forward( in.mix);
  fftReal = fft.getSpectrumReal();
  fftImag = fft.getSpectrumImaginary();
  for(int i = 0; i < specSize; i++)
  {
    float band = fft.getBand(i);

    //Sw = abs(Sw(1:(1+N/2))); %# abs is sqrt(real^2 + imag^2)
    float abs = sqrt(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);

    fftSmooth[i] *= smoothing;
    if(fftSmooth[i] < abs) fftSmooth[i] = abs;

    stroke(i,100,50);
    line( i, height, i, height - fftSmooth[i]*8 );
    stroke(i,100,100);
    line( i, height, i, height - band*8 );


  }
  text("smoothing: " + (int)(smoothing*100)+"\nwindow:"+wlabel[windex],10,10);
}
void keyPressed(){
  float inc = 0.01;
  if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
  if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
  if(key == 'W' && windex < window.length-1) windex++;
  if(key == 'w' && windex > 0) windex--;
  if(key == 'w' || key == 'W') fft.window(window[windex]);
}

我不知道我在正确使用的窗口功能,因为我没有注意到它们之间的巨大差异。是复值的绝对值是否正确?我怎样才能得到一个可视化更接近我的目标是什么?

I'm not sure I'm using the window functions correctly because I don't notice a huge difference between them. Is the abs of the complex value correct ? How can I get a visualisation closer to my aim ?

更新3

我试过申请@ wakjah的有用的提示,比如:

I've tried to apply @wakjah's helpful tips like so:

import ddf.minim.analysis.*;
import ddf.minim.*;

Minim       minim;
AudioInput  in;
FFT         fft;

float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
float[] fftPrev;
float[] fftCurr;
int specSize;

WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};
String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};
int windex = 0;

int scale = 10;

void setup(){
  minim = new Minim(this);
  in = minim.getLineIn(Minim.STEREO, 512);
  fft = new FFT(in.bufferSize(), in.sampleRate());
  fft.window(window[windex]);
  specSize = fft.specSize();
  fftSmooth = new float[specSize];
  fftPrev   = new float[specSize];
  fftCurr   = new float[specSize];
  size(specSize, specSize/2);
  colorMode(HSB,specSize,100,100);
}

void draw(){
  background(0);
  stroke(255);

  fft.forward( in.mix);
  fftReal = fft.getSpectrumReal();
  fftImag = fft.getSpectrumImaginary();
  for(int i = 0; i < specSize; i++)
  {
    //float band = fft.getBand(i);
    //Sw = abs(Sw(1:(1+N/2))); %# abs is sqrt(real^2 + imag^2)
    //float abs = sqrt(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
    //fftSmooth[i] *= smoothing;
    //if(fftSmooth[i] < abs) fftSmooth[i] = abs;

    //x_dB = 10 * log10(real(x) ^ 2 + imag(x) ^ 2);
    fftCurr[i] = scale * (float)Math.log10(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
    //Y[k] = alpha * Y_(t-1)[k] + (1 - alpha) * X[k]
    fftSmooth[i] = smoothing * fftPrev[i] + ((1 - smoothing) * fftCurr[i]);

    fftPrev[i] = fftCurr[i];//

    stroke(i,100,100);
    line( i, height, i, height - fftSmooth[i]);

  }
  text("smoothing: " + (int)(smoothing*100)+"\nwindow:"+wlabel[windex]+"\nscale:"+scale,10,10);
}
void keyPressed(){
  float inc = 0.01;
  if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
  if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
  if(key == 'W' && windex < window.length-1) windex++;
  if(key == 'w' && windex > 0) windex--;
  if(key == 'w' || key == 'W') fft.window(window[windex]);
  if(keyCode == LEFT && scale > 1) scale--;
  if(keyCode == RIGHT) scale++;
}

我不知道我申请的提示,按预期。这里是我的输出的外观:

I'm not sure I've applied the hints as intended. Here's how my output looks:

但我不认为我有但如果我比较这与可视化我的目标:

but I don't think I'm there yet if I compare this with visualisations I'm aiming for:

频谱在Windows Media Player

spectrum in windows media player

谱VLC播放器

spectrum in VLC player

我不知道我已经正确地应用日志规模。我的假设是,我会类似的情节是我的目标,使用log10的后(​​忽略平滑现在)。

I'm not sure I've applied the log scale correctly. My assumptions was, that I would a plot similar to what I'm aiming for after using log10 (ignoring smoothing for now).

更新4:

import ddf.minim.analysis.*;
import ddf.minim.*;

Minim       minim;
AudioInput  in;
FFT         fft;

float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
float[] fftPrev;
float[] fftCurr;
int specSize;

WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};
String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};
int windex = 0;

int scale = 10;

void setup(){
  minim = new Minim(this);
  in = minim.getLineIn(Minim.STEREO, 512);
  fft = new FFT(in.bufferSize(), in.sampleRate());
  fft.window(window[windex]);
  specSize = fft.specSize();
  fftSmooth = new float[specSize];
  fftPrev   = new float[specSize];
  fftCurr   = new float[specSize];
  size(specSize, specSize/2);
  colorMode(HSB,specSize,100,100);
}

void draw(){
  background(0);
  stroke(255);

  fft.forward( in.mix);
  fftReal = fft.getSpectrumReal();
  fftImag = fft.getSpectrumImaginary();
  for(int i = 0; i < specSize; i++)
  {    
    float maxVal = Math.max(Math.abs(fftReal[i]), Math.abs(fftImag[i]));
    if (maxVal != 0.0f) { // prevent divide-by-zero
        // Normalize
        fftReal[i] = fftReal[i] / maxVal;
        fftImag[i] = fftImag[i] / maxVal;
    }

    fftCurr[i] = -scale * (float)Math.log10(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
    fftSmooth[i] = smoothing * fftSmooth[i] + ((1 - smoothing) * fftCurr[i]);

    stroke(i,100,100);
    line( i, height/2, i, height/2 - (mousePressed ? fftSmooth[i] : fftCurr[i]));

  }
  text("smoothing: " + (int)(smoothing*100)+"\nwindow:"+wlabel[windex]+"\nscale:"+scale,10,10);
}
void keyPressed(){
  float inc = 0.01;
  if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
  if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
  if(key == 'W' && windex < window.length-1) windex++;
  if(key == 'w' && windex > 0) windex--;
  if(key == 'w' || key == 'W') fft.window(window[windex]);
  if(keyCode == LEFT && scale > 1) scale--;
  if(keyCode == RIGHT) scale++;
}

此产生:

在绘制环路我是从中心抽选因为规模现在是负的。
如果我缩放值高达结果开始看起来是随机的。

In the draw loop I'm drawing from the centre since scale is now negative. If I scale the values up the result starts to look random.

UPDATE6

import ddf.minim.analysis.*;
import ddf.minim.*;

Minim       minim;
AudioInput  in;
FFT         fft;

float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
float[] fftPrev;
float[] fftCurr;
int specSize;

WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};
String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};
int windex = 0;

int scale = 10;

void setup(){
  minim = new Minim(this);
  in = minim.getLineIn(Minim.STEREO, 512);
  fft = new FFT(in.bufferSize(), in.sampleRate());
  fft.window(window[windex]);
  specSize = fft.specSize();
  fftSmooth = new float[specSize];
  fftPrev   = new float[specSize];
  fftCurr   = new float[specSize];
  size(specSize, specSize/2);
  colorMode(HSB,specSize,100,100);
}

void draw(){
  background(0);
  stroke(255);

  fft.forward( in.mix);
  fftReal = fft.getSpectrumReal();
  fftImag = fft.getSpectrumImaginary();
  for(int i = 0; i < specSize; i++)
  {
    fftCurr[i] = scale * (float)Math.log10(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
    fftSmooth[i] = smoothing * fftSmooth[i] + ((1 - smoothing) * fftCurr[i]);

    stroke(i,100,100);
    line( i, height/2, i, height/2 - (mousePressed ? fftSmooth[i] : fftCurr[i]));

  }
  text("smoothing: " + (int)(smoothing*100)+"\nwindow:"+wlabel[windex]+"\nscale:"+scale,10,10);
}
void keyPressed(){
  float inc = 0.01;
  if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
  if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
  if(key == 'W' && windex < window.length-1) windex++;
  if(key == 'w' && windex > 0) windex--;
  if(key == 'w' || key == 'W') fft.window(window[windex]);
  if(keyCode == LEFT && scale > 1) scale--;
  if(keyCode == RIGHT) scale++;
  if(key == 's') saveFrame("fftmod.png");
}

这感觉如此接近:

这看起来比previous版本好多了,但是在频谱的下/左某些值看起来有点起飞和形状似乎很快改变。 (平滑值情节零)

This looks much better than the previous version, but some values on the lower/left side of the spectrum look a bit off and the shape seems to change very fast. (smoothed values plot zeroes)

推荐答案

我对你想要做的究竟是什么样的平滑有点不清楚,但我会尽量提供一些信息,可以帮助你。

I'm a little unclear on exactly what kind of smoothing you want to do, but I will try to provide some information that might help you.

一般情况下,当你把傅里叶变换和要显示它的一个图的,你需要(你提到),以对数缩放。这是因为这些值的幅度将在一个巨大的范围内变化 - 许多数量级 - 和玉米pressing到这一点在图上的小空间可观察将导致主峰相形见绌的信息的其余部分。

Generally, when you take the Fourier transform and you want to display a graph of it, you need (as you mention) to scale it logarithmically. This is because the magnitude of the values will vary over a huge range - many orders of magnitude - and compressing this into the small space observable on a graph will cause the main peaks to dwarf the rest of the information.

要真正做到这一点缩放,我们的值转换为分贝。重要的是要注意,分贝是规模,而不是一个单位 - 这重新presents一个的的两者之间的数字:通常是测量值和一些参考。为分贝通式为

To actually do this scaling, we convert the values to decibels. It is important to note that decibels is a scale and not a unit - it represents a ratio between two numbers: usually a measured value and some reference. The general formula for decibels is

x_dB = 10 * log10((x ^ 2) / (ref ^ 2))

其中,日志10 是对数为10进制, ^ 是电力运营商,而 x_ref 是您选择的参考价值。由于从音频文件FFT'd值不(通常)有任何有意义的单位, x_ref 通常选择为根本 1 此应用程序。此外,由于 X 是复杂的,你需要采取的绝对值。因此,公式将是

where log10 is logarithm to base 10, ^ is the power operator, and x_ref is your chosen reference value. Since FFT'd values from an audio file don't (usually) have any meaningful units,x_ref is commonly chosen to be simply 1 for this application. Further, since x is complex, you need to take the absolute value. So the formula will be

x_dB = 10 * log10(abs(x) ^ 2)

有一个小的(数字和速度)优化可能在这里,因为你平方平方根的结果是:

There is a small (numerical and speed) optimisation possible here, since you're squaring the result of a square-root:

x_dB = 10 * log10(real(x) ^ 2 + imag(x) ^ 2)

感性加权

测量声音pressure和功率水平时频域测量的尺度是常见的做:一个特定的测量类型选择为给定的应用(我不会进入这里的类型),以及声音的记录根据该测量类型制成。其结果是在每个频率取决于什么结果将被用于和什么类型的声音已被记录FFT'd然后乘以一个给定的权重。有两个权重常用的:A,和C C通常仅用于非常高幅度的声音。

Perceptual weighting

Scaling of frequency-domain measurements is commonly done when measuring sound pressure and power levels: a specific measurement type is chosen for the given application (I won't go into the types here), and a recording of sound is made according to this measurement type. The result is FFT'd and then multiplied by a given weighting at each frequency depending on what the result will be used for and what type of sound has been recorded. There are two weightings in common use: A, and C. C is generally used only for extremely high amplitude sounds.

请注意,这种权重是不是真的有必要,如果你只是想显示一个非常漂亮的图形:它是​​用来做世界确保每个人都可以遵循相同的标准测量(测量仪器)。如果你决定要包括这一点,它必须作为一个乘法来实现的的转换,以分贝(或增加加权的分贝值 - 这就是数学上等价)。

Note that this kind of weighting is not really necessary if you just want to display a nice-looking graph: it is used to make sure everyone in the world can make measurements (and measurement equipment) that follow the same standard. If you do decide to include this, it must be performed as a multiplication before conversion to decibels (or as an addition of the decibel value of the weighting - which is mathematically equivalent).

信息对A加权维基百科

窗口化主要是进行以降低吉布斯现象的的效果。我们永远不能完全摆脱它,但窗口不提供帮助。不幸的是它还有其他作用:锐峰变宽和旁瓣出台;总是有峰的清晰度和旁瓣的高度之间的折衷。我不会进入所有的细节在这里,除非你特别要求它;还有在这个免费的在线图书窗的相当长的解释。

Windowing is performed primarily to reduce the effect of the Gibbs phenomenon. We can never get rid of it completely but windowing does help. Unfortunately it has other effects: sharp peaks are broadened and "side-lobes" introduced; there is always a compromise between peak sharpness and side-lobe height. I am not going to go into all the details here unless you specifically ask for it; there is a fairly lengthy explanation of windowing in this free online book.

至于拍行中的每个频率点的衰减慢,这里有一个简单的想法,可能做的伎俩:在每个频率仓,采用一个简单的指数移动平均线。说你的FFT结果保存在 X [K] ,其中 K 是频率指数。让你的显示值是 Y [K] 这样

As for making the line in each frequency bin decay slowly, here's a simple idea that might do the trick: in each frequency bin, apply a simple exponential moving average. Say your FFT results are stored in X[k], where k is the frequency index. Let your display value be Y[k] such that

Y[k] = alpha * Y_(t-1)[k] + (1 - alpha) * X[k]

其中, 0℃;阿尔法&LT; 1 是平滑因子,而 Y_(T-1)[K] 的值Y [K] 最后一次步的( T-1 )。这实际上是一个简单的低通IIR(无限脉冲响应)滤波器,并希望应该做基本上你想要什么(也许是一点点调整)。越接近阿尔法是零,更迅速的新观测(输入 X [K] )将影响结果。越接近它是一个,越慢,结果会衰减,但输入也将影响结果更慢,因此可能会出现呆滞。你可能想添加一个有条件的周围立即采取新的值,如果它比当前值更高。

where 0 < alpha < 1 is your smoothing factor, and Y_(t-1)[k] is the value of Y[k] at the last time step (t-1). This is actually a simple low-pass IIR (infinite impulse response) filter, and hopefully should do basically what you want (perhaps with a little tweaking). The closer alpha is to zero, the more quickly new observations (input X[k]) will affect the result. The closer it is to one, the more slowly the result will decay, but the input will also affect the result more slowly, so it may appear "sluggish". You may want to add a conditional around it to take the new value immediately if it's higher than the current value.

需要注意的是,再次,这应该是之前转换为分贝执行。

Note that, again, this should be performed prior to conversion to decibels.

(编辑)已经看了code你贴一点更清楚,但这似乎是在你试图重现示例中使用的方法。你的初步尝试接近,但要注意,第一项是平滑系数乘以最后的显示值的,而不是当前的输入。

(edit) Having looked at the code you posted a little more clearly, this does appear to be the method used in the example you're trying to reproduce. Your initial attempt was close, but note that the first term is the smoothing coefficient multiplied by the last display value, not the current input.

(编辑2)你的第三个更新,再次接近,但公式中的以下行轻微误译

(edit 2) Your third update is, again, close, but there is a slight mistranslation of the formula in the following lines

fftSmooth[i] = smoothing * fftPrev[i] + ((1 - smoothing) * fftCurr[i]);

fftPrev[i] = fftCurr[i];//

而不是之前的平滑的FFT系数的的previous价值,你想利用后的平滑值的。 (注意,这意味着你实际上并不需要另一个数组存储previous值)

Instead of the previous value of the FFT coefficients before smoothing, you want to take the value after smoothing. (note that this means you don't actually need another array to store the previous value)

fftSmooth[i] = smoothing * fftSmooth[i] + ((1 - smoothing) * fftCurr[i]);

如果平滑== 0 ,此行应该有其他比标量相乘的结果影响不大。

If smoothing == 0, this line should have little effect other than to multiply the result by a scalar.

寻找更加紧密,它们在那里一个归一化,以便取其两个复值是最大值,变为1,而另一个则相应地缩放。这意味着你总是会得到0和1之间的绝对值,并可能是其替代分贝转换。真的,这是不太有什么自己的 ABS 功能顾名思义,这是一个有点讨厌......但无论如何,如果你复制这将保证你的价值观是的文件始终处于一个合理的范围内。

Looking more closely at the way they compute the absolute value, they have a normalization in there, so that whichever of the two complex values is the maximum, becomes 1, and the other is scaled accordingly. This means you will always get an absolute value between 0 and 1, and is probably their alternative to decibel conversion. Really, this is not quite what the documentation of their abs function suggests, which is a little annoying... but anyway, if you replicate this it will guarantee that your values are always in a sensible range.

要做到这一点只需在code,你可以不喜欢

To do this simply in your code, you could do something like

float maxVal = Math.max(Math.abs(fftReal[i]), Math.abs(fftImag[i]));
if (maxVal != 0.0f) { // prevent divide-by-zero
    // Normalize
    fftReal[i] = fftReal[i] / maxVal;
    fftImag[i] = fftImag[i] / maxVal;
}

fftCurr[i] = scale * (float)Math.log10(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
// ...

全部放在一起:一些code

已经搞砸与它周围在处理2.1一段时间,我有我相信你会很乐意与一个解决方案:

Putting it all together: Some code

Having messed around with it for a while in Processing 2.1, I have a solution that I believe you will be happy with:

import ddf.minim.analysis.*;
import ddf.minim.*;

Minim       minim;
//AudioInput  in;
AudioPlayer in;
FFT         fft;

float smoothing = 0.60;
final boolean useDB = true;
final int minBandwidthPerOctave = 200;
final int bandsPerOctave = 10;
float[] fftSmooth;
int avgSize;

float minVal = 0.0;
float maxVal = 0.0;
boolean firstMinDone = false;

void setup(){
  minim = new Minim(this);
  //in = minim.getLineIn(Minim.STEREO, 512);
  in = minim.loadFile("C:\\path\\to\\some\\audio\\file.ext", 2048);

  in.loop();

  fft = new FFT(in.bufferSize(), in.sampleRate());

  // Use logarithmically-spaced averaging
  fft.logAverages(minBandwidthPerOctave, bandsPerOctave);

  avgSize = fft.avgSize();
  fftSmooth = new float[avgSize];

  int myWidth = 500;
  int myHeight = 250;
  size(myWidth, myHeight);
  colorMode(HSB,avgSize,100,100);

}

float dB(float x) {
  if (x == 0) {
    return 0;
  }
  else {
    return 10 * (float)Math.log10(x);
  }
}

void draw(){
  background(0);
  stroke(255);

  fft.forward( in.mix);

  final int weight = width / avgSize;
  final float maxHeight = (height / 2) * 0.75;

  for (int i = 0; i < avgSize; i++) {
    // Get spectrum value (using dB conversion or not, as desired)
    float fftCurr;
    if (useDB) {
      fftCurr = dB(fft.getAvg(i));
    }
    else {
      fftCurr = fft.getAvg(i);
    }

    // Smooth using exponential moving average
    fftSmooth[i] = (smoothing) * fftSmooth[i] + ((1 - smoothing) * fftCurr);

    // Find max and min values ever displayed across whole spectrum
    if (fftSmooth[i] > maxVal) {
      maxVal = fftSmooth[i];
    }
    if (!firstMinDone || (fftSmooth[i] < minVal)) {
      minVal = fftSmooth[i];
    }
  }

  // Calculate the total range of smoothed spectrum; this will be used to scale all values to range 0...1
  final float range = maxVal - minVal;
  final float scaleFactor = range + 0.00001; // avoid div. by zero

  for(int i = 0; i < avgSize; i++)
  {
    stroke(i,100,100);
    strokeWeight(weight);

    // Y-coord of display line; fftSmooth is scaled to range 0...1; this is then multiplied by maxHeight
    // to make it within display port range
    float fftSmoothDisplay = maxHeight * ((fftSmooth[i] - minVal) / scaleFactor);

    // X-coord of display line
    float x = i * weight;

    line(x, height / 2, x, height / 2 - fftSmoothDisplay);
  }
  text("smoothing: " + (int)(smoothing*100)+"\n",10,10);
}
void keyPressed(){
  float inc = 0.01;
  if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
  if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
}

以上使用了稍微不同的方式 - 在一系列垃圾箱比总的频谱尺寸更小的平均频谱 - 这产生的结果比原来的更接近WMP的

The above uses a slightly different approach - averaging the spectrum in a series of bins that is smaller than the total spectrum size - that produces a result closer to WMP's than your original.

我有一个适用于每个频带的A计权(在code的升级版,虽然只有当分贝模式上,因为我有表处于分贝:)。打开A计权上更接近WMP的,或关闭一个接近VLC的结果。

I have an updated version of the code that applies the A-weighting in each frequency band (though only when dB mode is on, because the table I had was in dB :). Turn A-weighting on for a result closer to WMP's, or off for one closer to VLC's.

也有一些小的调整,其显示的方式:它现在集中在显示屏,它将只显示最多带中心频率

There are also some minor tweaks to the way it is displayed: it is now centred in the display and it will display only up to a maximum band centre frequency.

这里的code - !享受

Here's the code - enjoy!

import ddf.minim.analysis.*;
import ddf.minim.*;

Minim       minim;
//AudioInput  in;
AudioPlayer in;
FFT         fft;

float smoothing = 0.73;
final boolean useDB = true;
final boolean useAWeighting = true; // only used in dB mode, because the table I found was in dB 
final boolean resetBoundsAtEachStep = false;
final float maxViewportUsage = 0.85;
final int minBandwidthPerOctave = 200;
final int bandsPerOctave = 10;
final float maxCentreFrequency = 18000;
float[] fftSmooth;
int avgSize;

float minVal = 0.0;
float maxVal = 0.0;
boolean firstMinDone = false;

final float[] aWeightFrequency = { 
  10, 12.5, 16, 20, 
  25, 31.5, 40, 50, 
  63, 80, 100, 125, 
  160, 200, 250, 315, 
  400, 500, 630, 800, 
  1000, 1250, 1600, 2000, 
  2500, 3150, 4000, 5000,
  6300, 8000, 10000, 12500, 
  16000, 20000 
};

final float[] aWeightDecibels = {
  -70.4, -63.4, -56.7, -50.5, 
  -44.7, -39.4, -34.6, -30.2, 
  -26.2, -22.5, -19.1, -16.1, 
  -13.4, -10.9, -8.6, -6.6, 
  -4.8, -3.2, -1.9, -0.8, 
  0.0, 0.6, 1.0, 1.2, 
  1.3, 1.2, 1.0, 0.5, 
  -0.1, -1.1, -2.5, -4.3, 
  -6.6, -9.3 
};

float[] aWeightDBAtBandCentreFreqs;

void setup(){
  minim = new Minim(this);
  //in = minim.getLineIn(Minim.STEREO, 512);
  in = minim.loadFile("D:\\Music\\Arthur Brown\\The Crazy World Of Arthur Brown\\1-09 Fire.mp3", 2048);

  in.loop();

  fft = new FFT(in.bufferSize(), in.sampleRate());

  // Use logarithmically-spaced averaging
  fft.logAverages(minBandwidthPerOctave, bandsPerOctave);
  aWeightDBAtBandCentreFreqs = calculateAWeightingDBForFFTAverages(fft);

  avgSize = fft.avgSize();
  // Only use freqs up to maxCentreFrequency - ones above this may have
  // values too small that will skew our range calculation for all time
  while (fft.getAverageCenterFrequency(avgSize-1) > maxCentreFrequency) {
    avgSize--;
  }

  fftSmooth = new float[avgSize];

  int myWidth = 500;
  int myHeight = 250;
  size(myWidth, myHeight);
  colorMode(HSB,avgSize,100,100);

}

float[] calculateAWeightingDBForFFTAverages(FFT fft) {
  float[] result = new float[fft.avgSize()];
  for (int i = 0; i < result.length; i++) {
    result[i] = calculateAWeightingDBAtFrequency(fft.getAverageCenterFrequency(i));
  }
  return result;    
}

float calculateAWeightingDBAtFrequency(float frequency) {
  return linterp(aWeightFrequency, aWeightDecibels, frequency);    
}

float dB(float x) {
  if (x == 0) {
    return 0;
  }
  else {
    return 10 * (float)Math.log10(x);
  }
}

float linterp(float[] x, float[] y, float xx) {
  assert(x.length > 1);
  assert(x.length == y.length);

  float result = 0.0;
  boolean found = false;

  if (x[0] > xx) {
    result = y[0];
    found = true;
  }

  if (!found) {
    for (int i = 1; i < x.length; i++) {
      if (x[i] > xx) {
        result = y[i-1] + ((xx - x[i-1]) / (x[i] - x[i-1])) * (y[i] - y[i-1]);
        found = true;
        break;
      }
    }
  }

  if (!found) {
    result = y[y.length-1];
  }

  return result;     
}

void draw(){
  background(0);
  stroke(255);

  fft.forward( in.mix);

  final int weight = width / avgSize;
  final float maxHeight = height * maxViewportUsage;
  final float xOffset = weight / 2 + (width - avgSize * weight) / 2;

  if (resetBoundsAtEachStep) {
    minVal = 0.0;
    maxVal = 0.0;
    firstMinDone = false;
  }

  for (int i = 0; i < avgSize; i++) {
    // Get spectrum value (using dB conversion or not, as desired)
    float fftCurr;
    if (useDB) {
      fftCurr = dB(fft.getAvg(i));
      if (useAWeighting) {
        fftCurr += aWeightDBAtBandCentreFreqs[i];
      }
    }
    else {
      fftCurr = fft.getAvg(i);
    }

    // Smooth using exponential moving average
    fftSmooth[i] = (smoothing) * fftSmooth[i] + ((1 - smoothing) * fftCurr);

    // Find max and min values ever displayed across whole spectrum
    if (fftSmooth[i] > maxVal) {
      maxVal = fftSmooth[i];
    }
    if (!firstMinDone || (fftSmooth[i] < minVal)) {
      minVal = fftSmooth[i];
    }
  }

  // Calculate the total range of smoothed spectrum; this will be used to scale all values to range 0...1
  final float range = maxVal - minVal;
  final float scaleFactor = range + 0.00001; // avoid div. by zero

  for(int i = 0; i < avgSize; i++)
  {
    stroke(i,100,100);
    strokeWeight(weight);

    // Y-coord of display line; fftSmooth is scaled to range 0...1; this is then multiplied by maxHeight
    // to make it within display port range
    float fftSmoothDisplay = maxHeight * ((fftSmooth[i] - minVal) / scaleFactor);
    // Artificially impose a minimum of zero (this is mathematically bogus, but whatever)
    fftSmoothDisplay = max(0.0, fftSmoothDisplay);

    // X-coord of display line
    float x = xOffset + i * weight;

    line(x, height, x, height - fftSmoothDisplay);
  }
  text("smoothing: " + (int)(smoothing*100)+"\n",10,10);
}
void keyPressed(){
  float inc = 0.01;
  if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
  if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
}

这篇关于如何过滤FFT数据(音频可视化)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆