为什么MFCC提取库返回不同的值? [英] Why do MFCC extraction libs return different values?

查看:257
本文介绍了为什么MFCC提取库返回不同的值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用两个不同的库来提取MFCC功能:

I am extracting the MFCC features using two different libraries:

  • python_speech_features库
  • BOB库

但是两者的输出是不同的,甚至形状也不相同.那是正常的吗?还是我缺少一个参数?

However the output of the two is different and even the shapes are not the same. Is that normal? or is there a parameter that I am missing?

我的代码的相关部分如下:

The relevant section of my code is the following:

import bob.ap
import numpy as np
from scipy.io.wavfile import read
from sklearn import preprocessing
from python_speech_features import mfcc, delta, logfbank

def bob_extract_features(audio, rate):
    #get MFCC
    rate              = 8000  # rate
    win_length_ms     = 30    # The window length of the cepstral analysis in milliseconds
    win_shift_ms      = 10    # The window shift of the cepstral analysis in milliseconds
    n_filters         = 26    # The number of filter bands
    n_ceps            = 13    # The number of cepstral coefficients
    f_min             = 0.    # The minimal frequency of the filter bank
    f_max             = 4000. # The maximal frequency of the filter bank
    delta_win         = 2     # The integer delta value used for computing the first and second order derivatives
    pre_emphasis_coef = 0.97  # The coefficient used for the pre-emphasis
    dct_norm          = True  # A factor by which the cepstral coefficients are multiplied
    mel_scale         = True  # Tell whether cepstral features are extracted on a linear (LFCC) or Mel (MFCC) scale

    c = bob.ap.Ceps(rate, win_length_ms, win_shift_ms, n_filters, n_ceps, f_min,
                    f_max, delta_win, pre_emphasis_coef, mel_scale, dct_norm)
    c.with_delta       = False
    c.with_delta_delta = False
    c.with_energy      = False

    signal = np.cast['float'](audio)           # vector should be in **float**
    example_mfcc = c(signal)                   # mfcc + mfcc' + mfcc''
    return  example_mfcc


def psf_extract_features(audio, rate):
    signal = np.cast['float'](audio) #vector should be in **float**
    mfcc_feature = mfcc(signal, rate, winlen = 0.03, winstep = 0.01, numcep = 13,
                        nfilt = 26, nfft = 512,appendEnergy = False)

    #mfcc_feature = preprocessing.scale(mfcc_feature)
    deltas       = delta(mfcc_feature, 2)
    fbank_feat   = logfbank(audio, rate)
    combined     = np.hstack((mfcc_feature, deltas))
    return mfcc_feature



track = 'test-sample.wav'
rate, audio = read(track)

features1 = psf_extract_features(audio, rate)
features2 = bob_extract_features(audio, rate)

print("--------------------------------------------")
t = (features1 == features2)
print(t)

推荐答案

但是两者的输出是不同的,甚至形状也不相同.正常吗?

However the output of the two is different and even the shapes are not the same. Is that normal?

是的,算法种类繁多,每种实现都选择自己的风格

Yes, there are different varieties of the algorithm and each implementation choose its own flavor

还是我缺少一个参数?

or is there a parameter that I am missing?

这不仅与参数有关,在算法上也有差异,例如窗口形状(汉明与汉宁),梅尔过滤器的形状,梅尔过滤器的开始,梅尔过滤器的规格化,提升,dct风味等等.

It is not just about parameters, there are algorithmic differences too like window shape (hamming vs hanning), shape of mel filters, starts of mel filters, normalization of mel filters, liftering, dct flavor and so on and so forth.

如果要获得相同的结果,只需使用单个库进行提取,就很难同步它们.

If you want same results just use the single library for extraction, it is pretty hopeless to sync them.

这篇关于为什么MFCC提取库返回不同的值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆