为什么MFCC提取库返回不同的值? [英] Why do MFCC extraction libs return different values?
问题描述
我正在使用两个不同的库来提取MFCC功能:
I am extracting the MFCC features using two different libraries:
- python_speech_features库
- BOB库
但是两者的输出是不同的,甚至形状也不相同.那是正常的吗?还是我缺少一个参数?
However the output of the two is different and even the shapes are not the same. Is that normal? or is there a parameter that I am missing?
我的代码的相关部分如下:
The relevant section of my code is the following:
import bob.ap
import numpy as np
from scipy.io.wavfile import read
from sklearn import preprocessing
from python_speech_features import mfcc, delta, logfbank
def bob_extract_features(audio, rate):
#get MFCC
rate = 8000 # rate
win_length_ms = 30 # The window length of the cepstral analysis in milliseconds
win_shift_ms = 10 # The window shift of the cepstral analysis in milliseconds
n_filters = 26 # The number of filter bands
n_ceps = 13 # The number of cepstral coefficients
f_min = 0. # The minimal frequency of the filter bank
f_max = 4000. # The maximal frequency of the filter bank
delta_win = 2 # The integer delta value used for computing the first and second order derivatives
pre_emphasis_coef = 0.97 # The coefficient used for the pre-emphasis
dct_norm = True # A factor by which the cepstral coefficients are multiplied
mel_scale = True # Tell whether cepstral features are extracted on a linear (LFCC) or Mel (MFCC) scale
c = bob.ap.Ceps(rate, win_length_ms, win_shift_ms, n_filters, n_ceps, f_min,
f_max, delta_win, pre_emphasis_coef, mel_scale, dct_norm)
c.with_delta = False
c.with_delta_delta = False
c.with_energy = False
signal = np.cast['float'](audio) # vector should be in **float**
example_mfcc = c(signal) # mfcc + mfcc' + mfcc''
return example_mfcc
def psf_extract_features(audio, rate):
signal = np.cast['float'](audio) #vector should be in **float**
mfcc_feature = mfcc(signal, rate, winlen = 0.03, winstep = 0.01, numcep = 13,
nfilt = 26, nfft = 512,appendEnergy = False)
#mfcc_feature = preprocessing.scale(mfcc_feature)
deltas = delta(mfcc_feature, 2)
fbank_feat = logfbank(audio, rate)
combined = np.hstack((mfcc_feature, deltas))
return mfcc_feature
track = 'test-sample.wav'
rate, audio = read(track)
features1 = psf_extract_features(audio, rate)
features2 = bob_extract_features(audio, rate)
print("--------------------------------------------")
t = (features1 == features2)
print(t)
推荐答案
但是两者的输出是不同的,甚至形状也不相同.正常吗?
However the output of the two is different and even the shapes are not the same. Is that normal?
是的,算法种类繁多,每种实现都选择自己的风格
Yes, there are different varieties of the algorithm and each implementation choose its own flavor
还是我缺少一个参数?
or is there a parameter that I am missing?
这不仅与参数有关,在算法上也有差异,例如窗口形状(汉明与汉宁),梅尔过滤器的形状,梅尔过滤器的开始,梅尔过滤器的规格化,提升,dct风味等等.
It is not just about parameters, there are algorithmic differences too like window shape (hamming vs hanning), shape of mel filters, starts of mel filters, normalization of mel filters, liftering, dct flavor and so on and so forth.
如果要获得相同的结果,只需使用单个库进行提取,就很难同步它们.
If you want same results just use the single library for extraction, it is pretty hopeless to sync them.
这篇关于为什么MFCC提取库返回不同的值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!