产生“不详"的librosaMFCC频谱图 [英] librosa producing "undetailed" MFCC spectrogram

查看:62
本文介绍了产生“不详"的librosaMFCC频谱图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用librosa创建MFCC图,但是该图似乎并不十分详细.目标是将该MFCC频谱图呈现给神经网络.我正在测试的音频文件长约1秒,来自Google Speech Commands数据集.我的代码是:

I am trying to create an MFCC plot with librosa but the plot just doesn't appear to be very detailed. The goal is to present this MFCC spectrogram to a neural network. The audio file I am testing with is around 1 second long and is from the Google Speech Commands dataset. My code is:

WINDOW_SIZE = 20
NFFT=int((WINDOW_SIZE/1000)*16000)

samples, _ = librosa.load(f, sr=16000) 

mfccs = librosa.feature.mfcc(y=samples[:16000], sr=16000, n_fft=NFFT, n_mfcc=40)

plt.figure(figsize=(10, 4))
librosa.display.specshow(mfccs, x_axis='time')
plt.colorbar()
plt.title('MFCC')
plt.tight_layout()
plt.show()

这是正在生成的MFCC频谱图:

This is the MFCC spectrogram being produced:

推荐答案

与其他系数相比,第0个系数具有更多的能量,因此图中其他波段的差异显示得不是很好.

The 0th coefficient has a lot more energy compared to the rest, so differences in the other bands don't show very well in the plot.

您可能需要对此进行归一化,以便所有系数都在同一比例尺上.您可以计算每个系数的平均值和std,然后通过减去平均值并除以标准偏差来进行标准化.可以按剪辑或在整个训练集中完成.

You may want to normalize this such that all coefficients are on the same scale. You can compute the mean and std per coefficient and then standardize by subtracting the mean and dividing by the standard deviation. This can be done per clip, or across the training set.

这篇关于产生“不详"的librosaMFCC频谱图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆