使用Librosa生成的频谱图看起来与Kaldi不一致? [英] Spectrograms generated using Librosa don't look consistent with Kaldi?

查看:737
本文介绍了使用Librosa生成的频谱图看起来与Kaldi不一致?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用来自Kaldi的"egs/tidigits"代码,使用23个bin,20kHz采样率,25ms窗口和10ms移位生成了七"种发音的声谱图.频谱图显示如下,通过MATLAB imagesc函数可视化:

I generated spectrogram of a "seven" utterance using the "egs/tidigits" code from Kaldi, using 23 bins, 20kHz sampling rate, 25ms window, and 10ms shift. Spectrogram appears as below visualized via MATLAB imagesc function:

我正在尝试使用Librosa替代Kaldi.我使用与上面相同的箱数,采样率和窗口长度/移位,如下设置我的代码.

I am experimenting with using Librosa as an alternative to Kaldi. I set up my code as below using the same number of bins, sampling rate, and window length / shift as above.

time_series, sample_rate = librosa.core.load("7a.wav",sr=20000)
spectrogram = librosa.feature.melspectrogram(time_series, sr=20000, n_mels=23, n_fft=500, hop_length=200)
log_S = librosa.core.logamplitude(spectrogram)
np.savetxt("7a.txt", log_S.T)

但是,当我可视化同一WAV文件的结果Librosa频谱图时,它看起来却有所不同:

However when I visualize the resulting Librosa spectrogram of the same WAV file it looks different:

有人可以帮我理解为什么它们看起来如此不同吗?在其他的WAV文件中,我尝试过使用上述Librosa脚本时,我的摩擦音(如上例中的"seven"中的/s/)被切断,这极大地影响了我的数字分类精度.谢谢!

Can someone please help me understand why these look so different? Across other WAV files I've tried I notice that with my Librosa script above, my fricatives (like the /s/ in "seven" in the above example) are being cutoff and this is greatly affecting my digit classification accuracy. Thank you!

推荐答案

Kaldi默认将提升器应用于dct输出,这就是为什么较高系数会被衰减的原因.在此处中查看

Kaldi applies lifter by default on dct output, thats why upper coefficients are attenuated. See details here.

这篇关于使用Librosa生成的频谱图看起来与Kaldi不一致?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆