如何过滤 FFT 数据(用于音频可视化)? [英] How to filter FFT data (for audio visualisation)?

查看:51
本文介绍了如何过滤 FFT 数据(用于音频可视化)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在查看这个 的 doFFT 方法可能会进行对数缩放.我还没有弄清楚 doFFT 是如何工作的.

如何使用对数刻度缩放原始 FFT 图以考虑感知?我的目标是做一个看起来不错的可视化,我猜我需要:

  • 平滑值,否则元素将动画为快速/抽搐
  • 缩放 FFT bin/bands以获得更好的中/高频数据
  • 映射处理 FFT 值到视觉元素(找到最大值/边界)

关于如何实现这一目标的任何提示?

更新 2

我猜这部分是我在 Web Audio API 中追求的平滑和缩放://将 0dBfs 处的输入正弦波归一化为 0dBfs(撤消 FFT 缩放因子).const double scaleScale = 1.0/DefaultFFTSize;

//0 值不与前一个结果求平均.较大的值会产生较慢但较平滑的变化.双 k = m_smoothingTimeConstant;k = max(0.0, k);k = min(1.0, k);//将分析数据从复数转换为幅度,并与之前的结果进行平均.浮点*目的地=幅度缓冲().数据();size_t n = sizeBuffer().size();for (size_t i = 0; i < n; ++i) {复数 c(realP[i], imagP[i]);双标量幅度 = abs(c) * 幅度标度;目的地[i] = 浮点数(k * 目的地[i] + (1 - k) * scalarMagnitude);}

似乎缩放是通过取复数值的绝对值来完成的.这篇文章指向同一个方向.我已经尝试使用 Minim 和各种窗口函数使用复数的 abs 但它看起来仍然不像我的目标(网络音频 API 演示):

import ddf.minim.analysis.*;导入 ddf.minim.*;最小最小;音频输入;FFT FFT;浮动平滑 = 0;浮动[] fftReal;浮动[] fftImag;浮动[] fftSmooth;内部规格尺寸;WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};INT WINDEX = 0;无效设置(){尺寸(640、360、P3D);minim = 新的 Minim(this);in = minim.getLineIn(Minim.STEREO, 512);fft = 新 FFT(in.bufferSize(), in.sampleRate());fft.window(window[windex]);specSize = fft.specSize();fftSmooth = 新浮点[specSize];fftReal = 新浮点数[specSize];colorMode(HSB,specSize,100,100);}无效绘制(){背景(0);行程(255);fft.forward(in.mix);fftReal = fft.getSpectrumReal();fftImag = fft.getSpectrumImaginary();for(int i = 0; i < specSize; i++){浮动乐队 = fft.getBand(i);//Sw = abs(Sw(1:(1+N/2)));%# abs 是 sqrt(real^2 + imag^2)float abs = sqrt(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);fftSmooth[i] *= 平滑;if(fftSmooth[i] < abs) fftSmooth[i] = abs;中风(i,100,50);线(我,高度,我,高度 - fftSmooth[i]*8);中风(i,100,100);线(我,高度,我,高度 - 带*8);}text("smoothing:" + (int)(smoothing*100)+"
window:"+wlabel[windex],10,10);}void keyPressed(){浮动公司 = 0.01;if(keyCode == UP && 平滑 < 1-inc) 平滑 += inc;if(keyCode == DOWN && smoothing > inc) 平滑 -= inc;if(key == 'W' &&windex 

我不确定我是否正确使用了窗口函数,因为我没有注意到它们之间的巨大差异.复数值的绝对值是否正确?如何让可视化更接近我的目标?

更新 3

我尝试应用@wakjah 的有用提示,如下所示:

import ddf.minim.analysis.*;导入 ddf.minim.*;最小最小;音频输入;FFT FFT;浮动平滑 = 0;浮动[] fftReal;浮动[] fftImag;浮动[] fftSmooth;浮动[] fftPrev;浮动[] fftCurr;内部规格尺寸;WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};INT WINDEX = 0;整数比例 = 10;无效设置(){minim = 新的 Minim(this);in = minim.getLineIn(Minim.STEREO, 512);fft = 新 FFT(in.bufferSize(), in.sampleRate());fft.window(window[windex]);specSize = fft.specSize();fftSmooth = 新浮点[specSize];fftPrev = 新浮点数[specSize];fftCurr = 新浮点数[specSize];尺寸(规格尺寸,规格尺寸/2);colorMode(HSB,specSize,100,100);}无效绘制(){背景(0);行程(255);fft.forward(in.mix);fftReal = fft.getSpectrumReal();fftImag = fft.getSpectrumImaginary();for(int i = 0; i < specSize; i++){//浮动波段 = fft.getBand(i);//Sw = abs(Sw(1:(1+N/2)));%# abs 是 sqrt(real^2 + imag^2)//float abs = sqrt(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);//fftSmooth[i] *= 平滑;//if(fftSmooth[i] < abs) fftSmooth[i] = abs;//x_dB = 10 * log10(real(x) ^ 2 + imag(x) ^ 2);fftCurr[i] = scale * (float)Math.log10(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);//Y[k] = alpha * Y_(t-1)[k] + (1 - alpha) * X[k]fftSmooth[i] = 平滑 * fftPrev[i] + ((1 - 平滑) * fftCurr[i]);fftPrev[i] = fftCurr[i];//中风(i,100,100);线(我,高度,我,高度 - fftSmooth[i]);}text("平滑:" + (int)(smoothing*100)+"
window:"+wlabel[windex]+"
scale:"+scale,10,10);}void keyPressed(){浮动公司 = 0.01;if(keyCode == UP && 平滑 < 1-inc) 平滑 += inc;if(keyCode == DOWN && smoothing > inc) 平滑 -= inc;if(key == 'W' &&windex  1) scale--;if(keyCode == RIGHT) scale++;}

我不确定我是否按预期应用了提示.这是我的输出的样子:

但如果我将其与我的目标可视化进行比较,我认为我还没有达到:

windows 媒体播放器中的频谱

VLC 播放器中的频谱

我不确定我是否正确应用了对数刻度.我的假设是,在使用 log10(暂时忽略平滑)后,我会得到一个类似于我的目标的图.

更新 4:

import ddf.minim.analysis.*;导入 ddf.minim.*;最小最小;音频输入;FFT FFT;浮动平滑 = 0;浮动[] fftReal;浮动[] fftImag;浮动[] fftSmooth;浮动[] fftPrev;浮动[] fftCurr;内部规格尺寸;WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};INT WINDEX = 0;整数比例 = 10;无效设置(){minim = 新的 Minim(this);in = minim.getLineIn(Minim.STEREO, 512);fft = 新 FFT(in.bufferSize(), in.sampleRate());fft.window(window[windex]);specSize = fft.specSize();fftSmooth = 新浮点[specSize];fftPrev = 新浮点数[specSize];fftCurr = 新浮点数[specSize];尺寸(规格尺寸,规格尺寸/2);colorMode(HSB,specSize,100,100);}无效绘制(){背景(0);行程(255);fft.forward(in.mix);fftReal = fft.getSpectrumReal();fftImag = fft.getSpectrumImaginary();for(int i = 0; i < specSize; i++){float maxVal = Math.max(Math.abs(fftReal[i]), Math.abs(fftImag[i]));if (maxVal != 0.0f) {//防止被零除//规范化fftReal[i] = fftReal[i]/maxVal;fftImag[i] = fftImag[i]/maxVal;}fftCurr[i] = -scale * (float)Math.log10(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);fftSmooth[i] = 平滑 * fftSmooth[i] + ((1 - 平滑) * fftCurr[i]);中风(i,100,100);line( i, height/2, i, height/2 - (mousePressed ? fftSmooth[i] : fftCurr[i]));}text("平滑:" + (int)(smoothing*100)+"
window:"+wlabel[windex]+"
scale:"+scale,10,10);}void keyPressed(){浮动公司 = 0.01;if(keyCode == UP && 平滑 < 1-inc) 平滑 += inc;if(keyCode == DOWN && smoothing > inc) 平滑 -= inc;if(key == 'W' &&windex  1) scale--;if(keyCode == RIGHT) scale++;}

产生这个:

在绘制循环中,我从中心开始绘制,因为比例现在为负.如果我将值按比例放大,结果开始看起来是随机的.

更新6

import ddf.minim.analysis.*;导入 ddf.minim.*;最小最小;音频输入;FFT FFT;浮动平滑 = 0;浮动[] fftReal;浮动[] fftImag;浮动[] fftSmooth;浮动[] fftPrev;浮动[] fftCurr;内部规格尺寸;WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};INT WINDEX = 0;整数比例 = 10;无效设置(){minim = 新的 Minim(this);in = minim.getLineIn(Minim.STEREO, 512);fft = 新 FFT(in.bufferSize(), in.sampleRate());fft.window(window[windex]);specSize = fft.specSize();fftSmooth = 新浮点[specSize];fftPrev = 新浮点数[specSize];fftCurr = 新浮点数[specSize];尺寸(规格尺寸,规格尺寸/2);colorMode(HSB,specSize,100,100);}无效绘制(){背景(0);行程(255);fft.forward(in.mix);fftReal = fft.getSpectrumReal();fftImag = fft.getSpectrumImaginary();for(int i = 0; i < specSize; i++){fftCurr[i] = scale * (float)Math.log10(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);fftSmooth[i] = 平滑 * fftSmooth[i] + ((1 - 平滑) * fftCurr[i]);中风(i,100,100);line( i, height/2, i, height/2 - (mousePressed ? fftSmooth[i] : fftCurr[i]));}text("smoothing:" + (int)(smoothing*100)+"
window:"+wlabel[windex]+"
scale:"+scale,10,10);}void keyPressed(){浮动公司 = 0.01;if(keyCode == UP && 平滑 < 1-inc) 平滑 += inc;if(keyCode == DOWN && smoothing > inc) 平滑 -= inc;if(key == 'W' &&windex  1) scale--;if(keyCode == RIGHT) scale++;if(key == 's') saveFrame("fftmod.png");}

这感觉很接近:

这看起来比之前的版本好很多,但光谱下方/左侧的一些值看起来有点偏差,而且形状似乎变化得非常快.(平滑值绘制零)

解决方案

我有点不清楚您想要进行什么样的平滑处理,但我会尽力提供一些可能对您有所帮助的信息.

缩放显示的 FFT 结果

通常,当您进行傅立叶变换并且想要显示它的图形时,您需要(如您所述)以对数方式对其进行缩放.这是因为这些值的大小会在很大范围内变化——许多数量级——并且将其压缩到图形上可观察到的小空间中将导致主峰使其余信息相形见绌.

为了实际进行这种缩放,我们将值转换为分贝.需要注意的是,分贝是一个刻度而不是一个单位——它代表两个数字之间的比率:通常是一个测量值和一些参考值.分贝的一般公式是

x_dB = 10 * log10((x ^ 2)/(ref ^ 2))

其中 log10 是以 10 为底的对数,^ 是幂运算符,x_ref 是您选择的参考值.由于来自音频文件的 FFT 值(通常)没有任何有意义的单位,因此对于此应用程序,x_ref 通常被选择为简单的 1.此外,由于 x 很复杂,您需要取绝对值.所以公式将是

x_dB = 10 * log10(abs(x) ^ 2)

这里可能有一个小的(数值和速度)优化,因为您要对平方根的结果进行平方:

x_dB = 10 * log10(real(x) ^ 2 + imag(x) ^ 2)

感知权重

在测量声压和功率级时,通常会进行频域测量的缩放:为给定的应用选择特定的测量类型(我不会在这里介绍这些类型),并根据到这种测量类型.结果经过 FFT 运算,然后乘以每个频率的给定权重,具体取决于结果将用于什么以及录制的声音类型.常用的权重有两种:A 和 C.C 通常仅用于幅度极高的声音.

请注意,如果您只是想显示一个漂亮的图表,这种加权并不是真正必要的:它用于确保世界上的每个人都可以按照相同的标准进行测量(和测量设备).如果您决定将其包括在内,则必须在转换为分贝之前 将其作为乘法执行(或作为加权分贝值的加法 - 这在数学上是等效的).

关于 A-weighting 的信息是 维基百科上的.

窗口化

窗口化主要是为了减少吉布斯现象的影响.我们永远无法完全摆脱它,但窗口确实有帮助.不幸的是,它还有其他影响:尖峰变宽并引入了旁瓣";峰值锐度和旁瓣高度之间总是存在折衷.除非你特别要求,否则我不会在这里详细介绍.在这本免费的在线书籍中有相当长的关于窗口化的解释.

各个频率区间的时域平滑

至于使每个频率仓中的线缓慢衰减,这里有一个简单的想法可能会奏效:在每个频率仓中,应用一个简单的指数移动平均线.假设您的 FFT 结果存储在 X[k] 中,其中 k 是频率索引.让您的显示值为 Y[k] 使得

Y[k] = alpha * Y_(t-1)[k] + (1 - alpha) * X[k]

其中 <代码>0 <阿尔法<1 是你的平滑因子,Y_(t-1)[k]Y[k]最后一个时间步的值 (t-1).这实际上是一个简单的低通 IIR(无限脉冲响应)滤波器,希望基本上可以满足您的要求(可能需要稍作调整).alpha 越接近于零,新的观察结果(输入 X[k])就会越快地影响结果.越接近1,结果衰减得越慢,但输入也会更慢地影响结果,所以可能会显得呆滞".如果新值高于当前值,您可能需要在它周围添加一个条件以立即采用新值.

请再次注意,这应该在转换为分贝之前执行.

(edit) 更清楚地查看您发布的代码后,这似乎是您尝试重现的示例中使用的方法.您最初的尝试很接近,但请注意,第一项是平滑系数乘以最后一个显示值,而不是当前输入.

(编辑 2)您的第三次更新再次关闭,但以下几行中的公式略有误译

fftSmooth[i] = 平滑 * fftPrev[i] + ((1 - 平滑) * fftCurr[i]);fftPrev[i] = fftCurr[i];//

您想取平滑后的值,而不是平滑之前的FFT系数的前一个值.(请注意,这意味着您实际上并不需要另一个数组来存储先前的值)

fftSmooth[i] = 平滑 * fftSmooth[i] + ((1 - 平滑) * fftCurr[i]);

如果smoothing == 0,除了将结果乘以一个标量之外,这条线应该没有什么影响.

绝对值计算中的归一化

更仔细地观察它们计算绝对值的方式,它们在那里进行了归一化,因此两个复数值中的最大值变为 1,另一个相应地缩放.这意味着您将始终获得 0 和 1 之间的绝对值,并且可能是他们替代分贝转换的方法.真的,这并不是他们的 abs 函数的文档所暗示的那样,这有点烦人......但无论如何,如果你复制它,它将保证你的值总是在一个合理的范围内.

要在您的代码中简单地执行此操作,您可以执行类似的操作

float maxVal = Math.max(Math.abs(fftReal[i]), Math.abs(fftImag[i]));if (maxVal != 0.0f) {//防止被零除//规范化fftReal[i] = fftReal[i]/maxVal;fftImag[i] = fftImag[i]/maxVal;}fftCurr[i] = scale * (float)Math.log10(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);//...

把它们放在一起:一些代码

在 Processing 2.1 中搞砸了一段时间后,我有一个解决方案,我相信您会满意的:

import ddf.minim.analysis.*;导入 ddf.minim.*;最小最小;//音频输入;音频播放器中;FFT FFT;浮动平滑 = 0.60;最终布尔 useDB = true;最终 int minBandwidthPerOctave = 200;最终 intbandsPerOctave = 10;浮动[] fftSmooth;整数平均大小;浮动 minVal = 0.0;浮动 maxVal = 0.0;布尔 firstMinDone = false;无效设置(){minim = 新的 Minim(this);//in = minim.getLineIn(Minim.STEREO, 512);in = minim.loadFile("C:\path\to\some\audio\file.ext", 2048);循环();fft = 新 FFT(in.bufferSize(), in.sampleRate());//使用对数间隔平均fft.logAverages(minBandwidthPerOctave,bandsPerOctave);avgSize = fft.avgSize();fftSmooth = 新浮点 [avgSize];int myWidth = 500;int myHeight = 250;大小(我的宽度,我的高度);colorMode(HSB,avgSize,100,100);}浮动分贝(浮动 x){如果(x == 0){返回0;}别的 {返回 10 * (float)Math.log10(x);}}无效绘制(){背景(0);行程(255);fft.forward(in.mix);最终 int 重量 = 宽度/avgSize;最终浮动 maxHeight = (height/2) * 0.75;for (int i = 0; i < avgSize; i++) {//获取频谱值(根据需要使用或不使用 dB 转换)浮动fftCurr;如果(使用数据库){fftCurr = dB(fft.getAvg(i));}别的 {fftCurr = fft.getAvg(i);}//使用指数移动平均线平滑fftSmooth[i] = (平滑) * fftSmooth[i] + ((1 - 平滑) * fftCurr);//查找在整个频谱中显示的最大值和最小值如果(fftSmooth[i] > maxVal){maxVal = fftSmooth[i];}if (!firstMinDone || (fftSmooth[i] < minVal)) {minVal = fftSmooth[i];}}//计算平滑频谱的总范围;这将用于将所有值缩放到范围 0...1最终浮动范围 = maxVal - minVal;最终浮点比例因子 = 范围 + 0.00001;//避免 div.零for(int i = 0; i < avgSize; i++){中风(i,100,100);行程重量(重量);//显示线的 Y 坐标;fftSmooth 缩放到范围 0...1;然后乘以 maxHeight//使其在显示端口范围内float fftSmoothDisplay = maxHeight * ((fftSmooth[i] - minVal)/scaleFactor);//显示线的X坐标浮动 x = i * 重量;线(x,高度/2,x,高度/2 - fftSmoothDisplay);}text("平滑:" + (int)(smoothing*100)+"
",10,10);}void keyPressed(){浮动公司 = 0.01;if(keyCode == UP && 平滑 < 1-inc) 平滑 += inc;if(keyCode == DOWN && smoothing > inc) 平滑 -= inc;}

上面使用了一种稍微不同的方法 - 在一系列小于总频谱大小的区间中对频谱进行平均 - 产生的结果比原始结果更接近 WMP.

增强:现在使用 A 加权

我有一个更新版本的代码,它在每个频段应用 A 加权(尽管只有在 dB 模式打开时,因为我的表是 dB :).打开 A 加权以获得更接近 WMP 的结果,或关闭更接近 VLC 的结果.

它的显示方式也有一些细微的调整:它现在位于显示屏的中心,并且最多只能显示最大频带中心频率.

这是代码 - 享受吧!

import ddf.minim.analysis.*;导入 ddf.minim.*;最小最小;//音频输入;音频播放器中;FFT FFT;浮动平滑 = 0.73;最终布尔 useDB = true;最终布尔 useAWeighting = true;//只在dB模式下使用,因为我找到的表格是dB最终布尔值 resetBoundsAtEachStep = false;最终浮动 maxViewportUsage = 0.85;最终 int minBandwidthPerOctave = 200;最终 intbandsPerOctave = 10;最终浮动 maxCentreFrequency = 18000;浮动[] fftSmooth;整数平均大小;浮动 minVal = 0.0;浮动 maxVal = 0.0;布尔 firstMinDone = false;最终浮动[] aWeightFrequency = {10, 12.5, 16, 20,25, 31.5, 40, 50,63, 80, 100, 125,160, 200, 250, 315,400, 500, 630, 800,1000、1250、1600、2000、2500、3150、4000、5000、6300、8000、10000、12500、16000、20000};最终浮动[] aWeightDecibels = {-70.4, -63.4, -56.7, -50.5,-44.7, -39.4, -34.6, -30.2,-26.2, -22.5, -19.1, -16.1,-13.4, -10.9, -8.6, -6.6,-4.8, -3.2, -1.9, -0.8,0.0, 0.6, 1.0, 1.2,1.3, 1.2, 1.0, 0.5,-0.1, -1.1, -2.5, -4.3,-6.6, -9.3};浮动[] aWeightDBAtBandCentreFreqs;无效设置(){minim = 新的 Minim(this);//in = minim.getLineIn(Minim.STEREO, 512);in = minim.loadFile("D:\Music\Arthur Brown\The Crazy World Of Arthur Brown\1-09 Fire.mp3", 2048);循环();fft = 新 FFT(in.bufferSize(), in.sampleRate());//使用对数间隔平均fft.logAverages(minBandwidthPerOctave,bandsPerOctave);aWeightDBAtBandCentreFreqs = calculateAWeightingDBForFFTAverages(fft);avgSize = fft.avgSize();//仅使用最高可达 maxCentreFrequency 的频率 - 高于此的频率可能有//值太小,会一直扭曲我们的范围计算而 (fft.getAverageCenterFrequency(avgSize-1) > maxCentreFrequency) {平均大小--;}fftSmooth = 新浮点 [avgSize];int myWidth = 500;int myHeight = 250;大小(我的宽度,我的高度);colorMode(HSB,avgSize,100,100);}浮动[]计算AWeightingDBForFFTAverages(FFT fft){float[] 结果 = 新的 float[fft.avgSize()];for (int i = 0; i < result.length; i++) {结果[i] = calculateAWeightingDBAtFrequency(fft.getAverageCenterFrequency(i));}返回结果;}浮动计算AWeightingDBAtFrequency(浮动频率){返回 linterp(aWeightFrequency, aWeightDecibels, 频率);}浮动分贝(浮动 x){如果(x == 0){返回0;}别的 {返回 10 * (float)Math.log10(x);}}float linterp(float[] x, float[] y, float xx) {断言(x.length > 1);断言(x.length == y.length);浮动结果 = 0.0;布尔值发现 = 假;如果 (x[0] > xx) {结果 = y[0];发现 = 真;}如果(!找到){for (int i = 1; i < x.length; i++) {如果 (x[i] > xx) {结果 = y[i-1] + ((xx - x[i-1])/(x[i] - x[i-1])) * (y[i] - y[i-1]);发现 = 真;休息;}}}如果(!找到){结果 = y[y.length-1];}返回结果;}无效绘制(){背景(0);行程(255);fft.forward(in.mix);最终 int 重量 = 宽度/avgSize;最终浮动 maxHeight = 高度 * maxViewportUsage;最终浮点 xOffset = 重量/2 + (宽度 - avgSize * 重量)/2;如果(resetBoundsAtEachStep){minVal = 0.0;最大价值 = 0.0;firstMinDone = false;}for (int i = 0; i < avgSize; i++) {//获取频谱值(根据需要使用或不使用 dB 转换)浮动fftCurr;如果(使用数据库){fftCurr = dB(fft.getAvg(i));如果(使用加权){fftCurr += aWeightDBAtBandCentreFreqs[i];}}别的 {fftCurr = fft.getAvg(i);}//使用指数移动平均线平滑fftSmooth[i] = (平滑) * fftSmooth[i] + ((1 - 平滑) * fftCurr);//查找在整个频谱中显示的最大值和最小值如果(fftSmooth[i] > maxVal){maxVal = fftSmooth[i];}if (!firstMinDone || (fftSmooth[i] < minVal)) {minVal = fftSmooth[i];}}//计算平滑频谱的总范围;这将用于将所有值缩放到范围 0...1最终浮动范围 = maxVal - minVal;最终浮点比例因子 = 范围 + 0.00001;//避免 div.零for(int i = 0; i < avgSize; i++){中风(i,100,100);行程重量(重量);//显示线的 Y 坐标;fftSmooth 缩放到范围 0...1;然后乘以 maxHeight//使其在显示端口范围内float fftSmoothDisplay = maxHeight * ((fftSmooth[i] - minVal)/scaleFactor);//人为地强加最小值为零(这在数学上是假的,但无论如何)fftSmoothDisplay = max(0.0, fftSmoothDisplay);//显示线的X坐标浮动 x = xOffset + i * 重量;线(x,高度,x,高度 - fftSmoothDisplay);}text("平滑:" + (int)(smoothing*100)+"
",10,10);}void keyPressed(){浮动公司 = 0.01;if(keyCode == UP && 平滑 < 1-inc) 平滑 += inc;if(keyCode == DOWN && smoothing > inc) 平滑 -= inc;}

I was looking at this Web Audio API demo, part of this nice book

If you look at the demo, the fft peaks fall smoothly. I'm trying to do same with Processing in Java mode using the minim library. I've looked at how this is done with the web audio api in the doFFTAnalysis() method and tried to replicate this with minim. I also tried to port how abs() works with the complex type:

/ 26.2.7/3 abs(__z):  Returns the magnitude of __z.
00565   template<typename _Tp>
00566     inline _Tp
00567     __complex_abs(const complex<_Tp>& __z)
00568     {
00569       _Tp __x = __z.real();
00570       _Tp __y = __z.imag();
00571       const _Tp __s = std::max(abs(__x), abs(__y));
00572       if (__s == _Tp())  // well ...
00573         return __s;
00574       __x /= __s; 
00575       __y /= __s;
00576       return __s * sqrt(__x * __x + __y * __y);
00577     }
00578 

I'm currently doing a quick prototype using Processing(a java framework/library). My code looks like this:

import ddf.minim.*;
import ddf.minim.analysis.*;

private int blockSize = 512;
private Minim minim;
private AudioInput in;
private FFT         mfft;
private float[]    time = new float[blockSize];//time domain
private float[]    real = new float[blockSize];
private float[]    imag = new float[blockSize];
private float[]    freq = new float[blockSize];//smoothed freq. domain

public void setup() {
  minim = new Minim(this);
  in = minim.getLineIn(Minim.STEREO, blockSize);
  mfft = new FFT( in.bufferSize(), in.sampleRate() );
}
public void draw() {
  background(255);
  for (int i = 0; i < blockSize; i++) time[i] = in.left.get(i);
  mfft.forward( time);
  real = mfft.getSpectrumReal();
  imag = mfft.getSpectrumImaginary();

  final float magnitudeScale = 1.0 / mfft.specSize();
  final float k = (float)mouseX/width;

  for (int i = 0; i < blockSize; i++)
  {      
      float creal = real[i];
      float cimag = imag[i];
      float s = Math.max(creal,cimag);
      creal /= s;
      cimag /= s; 
      float absComplex = (float)(s * Math.sqrt(creal*creal + cimag*cimag));
      float scalarMagnitude = absComplex * magnitudeScale;        
      freq[i] = (k * mfft.getBand(i) + (1 - k) * scalarMagnitude);

      line( i, height, i, height - freq[i]*8 );
  }
  fill(0);
  text("smoothing: " + k,10,10);
}

I'm not getting errors, which is good, but I'm not getting the expected behaviour which is bad. I expected the peaks to fall slower when smoothing(k) is close 1, but as far as I can tell my code only scales the bands.

Unfortunately math and sound isn't my strong point, so I'm stabbing in the dark. How can I replicate the nice visualisation from the Web Audio API demo ?

I would be tempted to say this can be language agnostic, but using javascript for example wouldn't apply :). However, I'm happy to try any other java library that does FFT analysis.

UPDATE

I've got a simple solution for smoothing (continuously diminish values of each previous fft band if the current fft band is not higher:

import ddf.minim.analysis.*;
import ddf.minim.*;

Minim       minim;
AudioInput  in;
FFT         fft;

float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
int specSize;
void setup(){
  size(640, 360, P3D);
  minim = new Minim(this);
  in = minim.getLineIn(Minim.STEREO, 512);
  fft = new FFT(in.bufferSize(), in.sampleRate());
  specSize = fft.specSize();
  fftSmooth = new float[specSize];
  fftReal   = new float[specSize];
  colorMode(HSB,specSize,100,100);
}

void draw(){
  background(0);
  stroke(255);

  fft.forward( in.left);
  fftReal = fft.getSpectrumReal();
  fftImag = fft.getSpectrumImaginary();
  for(int i = 0; i < specSize; i++)
  {
    float band = fft.getBand(i);

    fftSmooth[i] *= smoothing;
    if(fftSmooth[i] < band) fftSmooth[i] = band;
    stroke(i,100,50);
    line( i, height, i, height - fftSmooth[i]*8 );
    stroke(i,100,100);
    line( i, height, i, height - band*8 );


  }
  text("smoothing: " + (int)(smoothing*100),10,10);
}
void keyPressed(){
  float inc = 0.01;
  if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
  if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
}

The faded graph is the smoothed one and the fully saturated one is the live one.

I am however still missing something, in comparison to the Web Audio API demo:

I think the Web Audio API might take into account that the medium and higher frequencies will need to be scaled to be closer to what we perceive, but I'm not sure how to tackle that.

I was trying to read more on how the RealtimeAnalyser class does this for the WebAudioAPI, but it seems FFTFrame class's doFFT method might do the logarithmic scaling. I haven't figured out how doFFT works yet.

How can I scale a raw FFT graph with a logarithmic scale to account for perception ? My goal is to do a decent looking visualisation and my guess is i will need to:

  • smooth values, otherwise elements will animate to fast/twitchy
  • scale the FFT bins/bands to get better data for medium/high frequencies
  • map process FFT values to visual elements (find the maximum values/bounds)

Any hints on how I can achieve this ?

UPDATE 2

I'm guessing this part does the smoothing and scaling I'm after in the Web Audio API: // Normalize so than an input sine wave at 0dBfs registers as 0dBfs (undo FFT scaling factor). const double magnitudeScale = 1.0 / DefaultFFTSize;

// A value of 0 does no averaging with the previous result.  Larger values produce slower, but smoother changes.
double k = m_smoothingTimeConstant;
k = max(0.0, k);
k = min(1.0, k);    

// Convert the analysis data from complex to magnitude and average with the previous result.
float* destination = magnitudeBuffer().data();
size_t n = magnitudeBuffer().size();
for (size_t i = 0; i < n; ++i) {
    Complex c(realP[i], imagP[i]);
    double scalarMagnitude = abs(c) * magnitudeScale;        
    destination[i] = float(k * destination[i] + (1 - k) * scalarMagnitude);
}

It seems the scaling is done by taking the absolute of the complex value. This post points in the same direction. I've tried using the abs of the complex number using Minim and using various window functions but it still doesn't look like what I'm aiming for(the Web Audio API demo):

import ddf.minim.analysis.*;
import ddf.minim.*;

Minim       minim;
AudioInput  in;
FFT         fft;

float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
int specSize;

WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};
String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};
int windex = 0;

void setup(){
  size(640, 360, P3D);
  minim = new Minim(this);
  in = minim.getLineIn(Minim.STEREO, 512);
  fft = new FFT(in.bufferSize(), in.sampleRate());
  fft.window(window[windex]);
  specSize = fft.specSize();
  fftSmooth = new float[specSize];
  fftReal   = new float[specSize];
  colorMode(HSB,specSize,100,100);
}

void draw(){
  background(0);
  stroke(255);

  fft.forward( in.mix);
  fftReal = fft.getSpectrumReal();
  fftImag = fft.getSpectrumImaginary();
  for(int i = 0; i < specSize; i++)
  {
    float band = fft.getBand(i);

    //Sw = abs(Sw(1:(1+N/2))); %# abs is sqrt(real^2 + imag^2)
    float abs = sqrt(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);

    fftSmooth[i] *= smoothing;
    if(fftSmooth[i] < abs) fftSmooth[i] = abs;

    stroke(i,100,50);
    line( i, height, i, height - fftSmooth[i]*8 );
    stroke(i,100,100);
    line( i, height, i, height - band*8 );


  }
  text("smoothing: " + (int)(smoothing*100)+"
window:"+wlabel[windex],10,10);
}
void keyPressed(){
  float inc = 0.01;
  if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
  if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
  if(key == 'W' && windex < window.length-1) windex++;
  if(key == 'w' && windex > 0) windex--;
  if(key == 'w' || key == 'W') fft.window(window[windex]);
}

I'm not sure I'm using the window functions correctly because I don't notice a huge difference between them. Is the abs of the complex value correct ? How can I get a visualisation closer to my aim ?

UPDATE 3

I've tried to apply @wakjah's helpful tips like so:

import ddf.minim.analysis.*;
import ddf.minim.*;

Minim       minim;
AudioInput  in;
FFT         fft;

float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
float[] fftPrev;
float[] fftCurr;
int specSize;

WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};
String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};
int windex = 0;

int scale = 10;

void setup(){
  minim = new Minim(this);
  in = minim.getLineIn(Minim.STEREO, 512);
  fft = new FFT(in.bufferSize(), in.sampleRate());
  fft.window(window[windex]);
  specSize = fft.specSize();
  fftSmooth = new float[specSize];
  fftPrev   = new float[specSize];
  fftCurr   = new float[specSize];
  size(specSize, specSize/2);
  colorMode(HSB,specSize,100,100);
}

void draw(){
  background(0);
  stroke(255);

  fft.forward( in.mix);
  fftReal = fft.getSpectrumReal();
  fftImag = fft.getSpectrumImaginary();
  for(int i = 0; i < specSize; i++)
  {
    //float band = fft.getBand(i);
    //Sw = abs(Sw(1:(1+N/2))); %# abs is sqrt(real^2 + imag^2)
    //float abs = sqrt(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
    //fftSmooth[i] *= smoothing;
    //if(fftSmooth[i] < abs) fftSmooth[i] = abs;

    //x_dB = 10 * log10(real(x) ^ 2 + imag(x) ^ 2);
    fftCurr[i] = scale * (float)Math.log10(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
    //Y[k] = alpha * Y_(t-1)[k] + (1 - alpha) * X[k]
    fftSmooth[i] = smoothing * fftPrev[i] + ((1 - smoothing) * fftCurr[i]);

    fftPrev[i] = fftCurr[i];//

    stroke(i,100,100);
    line( i, height, i, height - fftSmooth[i]);

  }
  text("smoothing: " + (int)(smoothing*100)+"
window:"+wlabel[windex]+"
scale:"+scale,10,10);
}
void keyPressed(){
  float inc = 0.01;
  if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
  if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
  if(key == 'W' && windex < window.length-1) windex++;
  if(key == 'w' && windex > 0) windex--;
  if(key == 'w' || key == 'W') fft.window(window[windex]);
  if(keyCode == LEFT && scale > 1) scale--;
  if(keyCode == RIGHT) scale++;
}

I'm not sure I've applied the hints as intended. Here's how my output looks:

but I don't think I'm there yet if I compare this with visualisations I'm aiming for:

spectrum in windows media player

spectrum in VLC player

I'm not sure I've applied the log scale correctly. My assumptions was, that I would a plot similar to what I'm aiming for after using log10 (ignoring smoothing for now).

UPDATE 4:

import ddf.minim.analysis.*;
import ddf.minim.*;

Minim       minim;
AudioInput  in;
FFT         fft;

float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
float[] fftPrev;
float[] fftCurr;
int specSize;

WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};
String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};
int windex = 0;

int scale = 10;

void setup(){
  minim = new Minim(this);
  in = minim.getLineIn(Minim.STEREO, 512);
  fft = new FFT(in.bufferSize(), in.sampleRate());
  fft.window(window[windex]);
  specSize = fft.specSize();
  fftSmooth = new float[specSize];
  fftPrev   = new float[specSize];
  fftCurr   = new float[specSize];
  size(specSize, specSize/2);
  colorMode(HSB,specSize,100,100);
}

void draw(){
  background(0);
  stroke(255);

  fft.forward( in.mix);
  fftReal = fft.getSpectrumReal();
  fftImag = fft.getSpectrumImaginary();
  for(int i = 0; i < specSize; i++)
  {    
    float maxVal = Math.max(Math.abs(fftReal[i]), Math.abs(fftImag[i]));
    if (maxVal != 0.0f) { // prevent divide-by-zero
        // Normalize
        fftReal[i] = fftReal[i] / maxVal;
        fftImag[i] = fftImag[i] / maxVal;
    }

    fftCurr[i] = -scale * (float)Math.log10(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
    fftSmooth[i] = smoothing * fftSmooth[i] + ((1 - smoothing) * fftCurr[i]);

    stroke(i,100,100);
    line( i, height/2, i, height/2 - (mousePressed ? fftSmooth[i] : fftCurr[i]));

  }
  text("smoothing: " + (int)(smoothing*100)+"
window:"+wlabel[windex]+"
scale:"+scale,10,10);
}
void keyPressed(){
  float inc = 0.01;
  if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
  if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
  if(key == 'W' && windex < window.length-1) windex++;
  if(key == 'w' && windex > 0) windex--;
  if(key == 'w' || key == 'W') fft.window(window[windex]);
  if(keyCode == LEFT && scale > 1) scale--;
  if(keyCode == RIGHT) scale++;
}

produces this:

In the draw loop I'm drawing from the centre since scale is now negative. If I scale the values up the result starts to look random.

UPDATE6

import ddf.minim.analysis.*;
import ddf.minim.*;

Minim       minim;
AudioInput  in;
FFT         fft;

float smoothing = 0;
float[] fftReal;
float[] fftImag;
float[] fftSmooth;
float[] fftPrev;
float[] fftCurr;
int specSize;

WindowFunction[] window = {FFT.NONE,FFT.HAMMING,FFT.HANN,FFT.COSINE,FFT.TRIANGULAR,FFT.BARTLETT,FFT.BARTLETTHANN,FFT.LANCZOS,FFT.BLACKMAN,FFT.GAUSS};
String[] wlabel = {"NONE","HAMMING","HANN","COSINE","TRIANGULAR","BARTLETT","BARTLETTHANN","LANCZOS","BLACKMAN","GAUSS"};
int windex = 0;

int scale = 10;

void setup(){
  minim = new Minim(this);
  in = minim.getLineIn(Minim.STEREO, 512);
  fft = new FFT(in.bufferSize(), in.sampleRate());
  fft.window(window[windex]);
  specSize = fft.specSize();
  fftSmooth = new float[specSize];
  fftPrev   = new float[specSize];
  fftCurr   = new float[specSize];
  size(specSize, specSize/2);
  colorMode(HSB,specSize,100,100);
}

void draw(){
  background(0);
  stroke(255);

  fft.forward( in.mix);
  fftReal = fft.getSpectrumReal();
  fftImag = fft.getSpectrumImaginary();
  for(int i = 0; i < specSize; i++)
  {
    fftCurr[i] = scale * (float)Math.log10(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
    fftSmooth[i] = smoothing * fftSmooth[i] + ((1 - smoothing) * fftCurr[i]);

    stroke(i,100,100);
    line( i, height/2, i, height/2 - (mousePressed ? fftSmooth[i] : fftCurr[i]));

  }
  text("smoothing: " + (int)(smoothing*100)+"
window:"+wlabel[windex]+"
scale:"+scale,10,10);
}
void keyPressed(){
  float inc = 0.01;
  if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
  if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
  if(key == 'W' && windex < window.length-1) windex++;
  if(key == 'w' && windex > 0) windex--;
  if(key == 'w' || key == 'W') fft.window(window[windex]);
  if(keyCode == LEFT && scale > 1) scale--;
  if(keyCode == RIGHT) scale++;
  if(key == 's') saveFrame("fftmod.png");
}

This feels so close:

This looks much better than the previous version, but some values on the lower/left side of the spectrum look a bit off and the shape seems to change very fast. (smoothed values plot zeroes)

解决方案

I'm a little unclear on exactly what kind of smoothing you want to do, but I will try to provide some information that might help you.

Scaling FFT results for display

Generally, when you take the Fourier transform and you want to display a graph of it, you need (as you mention) to scale it logarithmically. This is because the magnitude of the values will vary over a huge range - many orders of magnitude - and compressing this into the small space observable on a graph will cause the main peaks to dwarf the rest of the information.

To actually do this scaling, we convert the values to decibels. It is important to note that decibels is a scale and not a unit - it represents a ratio between two numbers: usually a measured value and some reference. The general formula for decibels is

x_dB = 10 * log10((x ^ 2) / (ref ^ 2))

where log10 is logarithm to base 10, ^ is the power operator, and x_ref is your chosen reference value. Since FFT'd values from an audio file don't (usually) have any meaningful units,x_ref is commonly chosen to be simply 1 for this application. Further, since x is complex, you need to take the absolute value. So the formula will be

x_dB = 10 * log10(abs(x) ^ 2)

There is a small (numerical and speed) optimisation possible here, since you're squaring the result of a square-root:

x_dB = 10 * log10(real(x) ^ 2 + imag(x) ^ 2)

Perceptual weighting

Scaling of frequency-domain measurements is commonly done when measuring sound pressure and power levels: a specific measurement type is chosen for the given application (I won't go into the types here), and a recording of sound is made according to this measurement type. The result is FFT'd and then multiplied by a given weighting at each frequency depending on what the result will be used for and what type of sound has been recorded. There are two weightings in common use: A, and C. C is generally used only for extremely high amplitude sounds.

Note that this kind of weighting is not really necessary if you just want to display a nice-looking graph: it is used to make sure everyone in the world can make measurements (and measurement equipment) that follow the same standard. If you do decide to include this, it must be performed as a multiplication before conversion to decibels (or as an addition of the decibel value of the weighting - which is mathematically equivalent).

Info on the A-weighting is on wikipedia.

Windowing

Windowing is performed primarily to reduce the effect of the Gibbs phenomenon. We can never get rid of it completely but windowing does help. Unfortunately it has other effects: sharp peaks are broadened and "side-lobes" introduced; there is always a compromise between peak sharpness and side-lobe height. I am not going to go into all the details here unless you specifically ask for it; there is a fairly lengthy explanation of windowing in this free online book.

Time-domain smoothing of individual frequency bins

As for making the line in each frequency bin decay slowly, here's a simple idea that might do the trick: in each frequency bin, apply a simple exponential moving average. Say your FFT results are stored in X[k], where k is the frequency index. Let your display value be Y[k] such that

Y[k] = alpha * Y_(t-1)[k] + (1 - alpha) * X[k]

where 0 < alpha < 1 is your smoothing factor, and Y_(t-1)[k] is the value of Y[k] at the last time step (t-1). This is actually a simple low-pass IIR (infinite impulse response) filter, and hopefully should do basically what you want (perhaps with a little tweaking). The closer alpha is to zero, the more quickly new observations (input X[k]) will affect the result. The closer it is to one, the more slowly the result will decay, but the input will also affect the result more slowly, so it may appear "sluggish". You may want to add a conditional around it to take the new value immediately if it's higher than the current value.

Note that, again, this should be performed prior to conversion to decibels.

(edit) Having looked at the code you posted a little more clearly, this does appear to be the method used in the example you're trying to reproduce. Your initial attempt was close, but note that the first term is the smoothing coefficient multiplied by the last display value, not the current input.

(edit 2) Your third update is, again, close, but there is a slight mistranslation of the formula in the following lines

fftSmooth[i] = smoothing * fftPrev[i] + ((1 - smoothing) * fftCurr[i]);

fftPrev[i] = fftCurr[i];//

Instead of the previous value of the FFT coefficients before smoothing, you want to take the value after smoothing. (note that this means you don't actually need another array to store the previous value)

fftSmooth[i] = smoothing * fftSmooth[i] + ((1 - smoothing) * fftCurr[i]);

If smoothing == 0, this line should have little effect other than to multiply the result by a scalar.

Normalization in the absolute value computation

Looking more closely at the way they compute the absolute value, they have a normalization in there, so that whichever of the two complex values is the maximum, becomes 1, and the other is scaled accordingly. This means you will always get an absolute value between 0 and 1, and is probably their alternative to decibel conversion. Really, this is not quite what the documentation of their abs function suggests, which is a little annoying... but anyway, if you replicate this it will guarantee that your values are always in a sensible range.

To do this simply in your code, you could do something like

float maxVal = Math.max(Math.abs(fftReal[i]), Math.abs(fftImag[i]));
if (maxVal != 0.0f) { // prevent divide-by-zero
    // Normalize
    fftReal[i] = fftReal[i] / maxVal;
    fftImag[i] = fftImag[i] / maxVal;
}

fftCurr[i] = scale * (float)Math.log10(fftReal[i]*fftReal[i] + fftImag[i]*fftImag[i]);
// ...

Putting it all together: Some code

Having messed around with it for a while in Processing 2.1, I have a solution that I believe you will be happy with:

import ddf.minim.analysis.*;
import ddf.minim.*;

Minim       minim;
//AudioInput  in;
AudioPlayer in;
FFT         fft;

float smoothing = 0.60;
final boolean useDB = true;
final int minBandwidthPerOctave = 200;
final int bandsPerOctave = 10;
float[] fftSmooth;
int avgSize;

float minVal = 0.0;
float maxVal = 0.0;
boolean firstMinDone = false;

void setup(){
  minim = new Minim(this);
  //in = minim.getLineIn(Minim.STEREO, 512);
  in = minim.loadFile("C:\path\to\some\audio\file.ext", 2048);

  in.loop();

  fft = new FFT(in.bufferSize(), in.sampleRate());

  // Use logarithmically-spaced averaging
  fft.logAverages(minBandwidthPerOctave, bandsPerOctave);

  avgSize = fft.avgSize();
  fftSmooth = new float[avgSize];

  int myWidth = 500;
  int myHeight = 250;
  size(myWidth, myHeight);
  colorMode(HSB,avgSize,100,100);

}

float dB(float x) {
  if (x == 0) {
    return 0;
  }
  else {
    return 10 * (float)Math.log10(x);
  }
}

void draw(){
  background(0);
  stroke(255);

  fft.forward( in.mix);

  final int weight = width / avgSize;
  final float maxHeight = (height / 2) * 0.75;

  for (int i = 0; i < avgSize; i++) {
    // Get spectrum value (using dB conversion or not, as desired)
    float fftCurr;
    if (useDB) {
      fftCurr = dB(fft.getAvg(i));
    }
    else {
      fftCurr = fft.getAvg(i);
    }

    // Smooth using exponential moving average
    fftSmooth[i] = (smoothing) * fftSmooth[i] + ((1 - smoothing) * fftCurr);

    // Find max and min values ever displayed across whole spectrum
    if (fftSmooth[i] > maxVal) {
      maxVal = fftSmooth[i];
    }
    if (!firstMinDone || (fftSmooth[i] < minVal)) {
      minVal = fftSmooth[i];
    }
  }

  // Calculate the total range of smoothed spectrum; this will be used to scale all values to range 0...1
  final float range = maxVal - minVal;
  final float scaleFactor = range + 0.00001; // avoid div. by zero

  for(int i = 0; i < avgSize; i++)
  {
    stroke(i,100,100);
    strokeWeight(weight);

    // Y-coord of display line; fftSmooth is scaled to range 0...1; this is then multiplied by maxHeight
    // to make it within display port range
    float fftSmoothDisplay = maxHeight * ((fftSmooth[i] - minVal) / scaleFactor);

    // X-coord of display line
    float x = i * weight;

    line(x, height / 2, x, height / 2 - fftSmoothDisplay);
  }
  text("smoothing: " + (int)(smoothing*100)+"
",10,10);
}
void keyPressed(){
  float inc = 0.01;
  if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
  if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
}

The above uses a slightly different approach - averaging the spectrum in a series of bins that is smaller than the total spectrum size - that produces a result closer to WMP's than your original.

Enhancement: Now with A-weighting

I have an updated version of the code that applies the A-weighting in each frequency band (though only when dB mode is on, because the table I had was in dB :). Turn A-weighting on for a result closer to WMP's, or off for one closer to VLC's.

There are also some minor tweaks to the way it is displayed: it is now centred in the display and it will display only up to a maximum band centre frequency.

Here's the code - enjoy!

import ddf.minim.analysis.*;
import ddf.minim.*;

Minim       minim;
//AudioInput  in;
AudioPlayer in;
FFT         fft;

float smoothing = 0.73;
final boolean useDB = true;
final boolean useAWeighting = true; // only used in dB mode, because the table I found was in dB 
final boolean resetBoundsAtEachStep = false;
final float maxViewportUsage = 0.85;
final int minBandwidthPerOctave = 200;
final int bandsPerOctave = 10;
final float maxCentreFrequency = 18000;
float[] fftSmooth;
int avgSize;

float minVal = 0.0;
float maxVal = 0.0;
boolean firstMinDone = false;

final float[] aWeightFrequency = { 
  10, 12.5, 16, 20, 
  25, 31.5, 40, 50, 
  63, 80, 100, 125, 
  160, 200, 250, 315, 
  400, 500, 630, 800, 
  1000, 1250, 1600, 2000, 
  2500, 3150, 4000, 5000,
  6300, 8000, 10000, 12500, 
  16000, 20000 
};

final float[] aWeightDecibels = {
  -70.4, -63.4, -56.7, -50.5, 
  -44.7, -39.4, -34.6, -30.2, 
  -26.2, -22.5, -19.1, -16.1, 
  -13.4, -10.9, -8.6, -6.6, 
  -4.8, -3.2, -1.9, -0.8, 
  0.0, 0.6, 1.0, 1.2, 
  1.3, 1.2, 1.0, 0.5, 
  -0.1, -1.1, -2.5, -4.3, 
  -6.6, -9.3 
};

float[] aWeightDBAtBandCentreFreqs;

void setup(){
  minim = new Minim(this);
  //in = minim.getLineIn(Minim.STEREO, 512);
  in = minim.loadFile("D:\Music\Arthur Brown\The Crazy World Of Arthur Brown\1-09 Fire.mp3", 2048);

  in.loop();

  fft = new FFT(in.bufferSize(), in.sampleRate());

  // Use logarithmically-spaced averaging
  fft.logAverages(minBandwidthPerOctave, bandsPerOctave);
  aWeightDBAtBandCentreFreqs = calculateAWeightingDBForFFTAverages(fft);

  avgSize = fft.avgSize();
  // Only use freqs up to maxCentreFrequency - ones above this may have
  // values too small that will skew our range calculation for all time
  while (fft.getAverageCenterFrequency(avgSize-1) > maxCentreFrequency) {
    avgSize--;
  }

  fftSmooth = new float[avgSize];

  int myWidth = 500;
  int myHeight = 250;
  size(myWidth, myHeight);
  colorMode(HSB,avgSize,100,100);

}

float[] calculateAWeightingDBForFFTAverages(FFT fft) {
  float[] result = new float[fft.avgSize()];
  for (int i = 0; i < result.length; i++) {
    result[i] = calculateAWeightingDBAtFrequency(fft.getAverageCenterFrequency(i));
  }
  return result;    
}

float calculateAWeightingDBAtFrequency(float frequency) {
  return linterp(aWeightFrequency, aWeightDecibels, frequency);    
}

float dB(float x) {
  if (x == 0) {
    return 0;
  }
  else {
    return 10 * (float)Math.log10(x);
  }
}

float linterp(float[] x, float[] y, float xx) {
  assert(x.length > 1);
  assert(x.length == y.length);

  float result = 0.0;
  boolean found = false;

  if (x[0] > xx) {
    result = y[0];
    found = true;
  }

  if (!found) {
    for (int i = 1; i < x.length; i++) {
      if (x[i] > xx) {
        result = y[i-1] + ((xx - x[i-1]) / (x[i] - x[i-1])) * (y[i] - y[i-1]);
        found = true;
        break;
      }
    }
  }

  if (!found) {
    result = y[y.length-1];
  }

  return result;     
}

void draw(){
  background(0);
  stroke(255);

  fft.forward( in.mix);

  final int weight = width / avgSize;
  final float maxHeight = height * maxViewportUsage;
  final float xOffset = weight / 2 + (width - avgSize * weight) / 2;

  if (resetBoundsAtEachStep) {
    minVal = 0.0;
    maxVal = 0.0;
    firstMinDone = false;
  }

  for (int i = 0; i < avgSize; i++) {
    // Get spectrum value (using dB conversion or not, as desired)
    float fftCurr;
    if (useDB) {
      fftCurr = dB(fft.getAvg(i));
      if (useAWeighting) {
        fftCurr += aWeightDBAtBandCentreFreqs[i];
      }
    }
    else {
      fftCurr = fft.getAvg(i);
    }

    // Smooth using exponential moving average
    fftSmooth[i] = (smoothing) * fftSmooth[i] + ((1 - smoothing) * fftCurr);

    // Find max and min values ever displayed across whole spectrum
    if (fftSmooth[i] > maxVal) {
      maxVal = fftSmooth[i];
    }
    if (!firstMinDone || (fftSmooth[i] < minVal)) {
      minVal = fftSmooth[i];
    }
  }

  // Calculate the total range of smoothed spectrum; this will be used to scale all values to range 0...1
  final float range = maxVal - minVal;
  final float scaleFactor = range + 0.00001; // avoid div. by zero

  for(int i = 0; i < avgSize; i++)
  {
    stroke(i,100,100);
    strokeWeight(weight);

    // Y-coord of display line; fftSmooth is scaled to range 0...1; this is then multiplied by maxHeight
    // to make it within display port range
    float fftSmoothDisplay = maxHeight * ((fftSmooth[i] - minVal) / scaleFactor);
    // Artificially impose a minimum of zero (this is mathematically bogus, but whatever)
    fftSmoothDisplay = max(0.0, fftSmoothDisplay);

    // X-coord of display line
    float x = xOffset + i * weight;

    line(x, height, x, height - fftSmoothDisplay);
  }
  text("smoothing: " + (int)(smoothing*100)+"
",10,10);
}
void keyPressed(){
  float inc = 0.01;
  if(keyCode == UP && smoothing < 1-inc) smoothing += inc;
  if(keyCode == DOWN && smoothing > inc) smoothing -= inc;
}

这篇关于如何过滤 FFT 数据(用于音频可视化)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆