如何从信号上的FFT获得MFCC? [英] HOW to get MFCC from an FFT on a signal?

查看:192
本文介绍了如何从信号上的FFT获得MFCC?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简短易懂: 大家好,很简单...我只想知道从FFT获取MFCC涉及的步骤.

SHORT AND SIMPLE: Hi all very simply... I just want to know the steps that are involved to get an MFCC from an FFT.

详细信息:

大家好.我正在研究要对声音进行分类的鼓应用程序.它只是一个匹配的应用程序,它返回您在鼓上弹奏的音符的名称.

Hi all. I am working on a drum application where I want to classify sounds. Its just a matching application, it returns the name of the note that you play on the drum.

它是一个简单的印度响亮的大鼓.那里只有几个音符可供演奏.

Its a simple indian loud big drum. There are only a few notes on there that one can play.

我已经实现了fft算法并成功获得了光谱.我现在想更进一步,并从fft返回mfcc.

I've implemented the fft algorithm and successfully obtain a spectrum. I now want to take it one step further and return the mfcc from the fft.

这是我到目前为止所了解的. 其基于对数功率谱在非线性梅尔尺度上的线性余弦变换.

This is what i understand so far. its based on linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.

它使用三角测量来滤除频率并获得所需的系数. http://instruct1.cit.cornell .edu/courses/ece576/FinalProjects/f2008/pae26_jsc59/pae26_jsc59/images/melfilt.png

it uses triangulation to filter out the frequencies and get a desired coefficient. http://instruct1.cit.cornell.edu/courses/ece576/FinalProjects/f2008/pae26_jsc59/pae26_jsc59/images/melfilt.png

因此,如果从fft算法返回大约1000个值-声音的频谱,那么理想情况下,您将获得大约12个元素(即系数).这个由12个元素组成的向量用于对乐器进行分类,包括演奏的鼓声...

so if you have around 1000 values returned from the fft algorithm - the spectrum of the sound, then desirably you'll get around 12 elements (i.e., coefficients). This 12-element vector is used to classify the instrument, including the drum played...

这正是我想要的.

有人可以帮我做这样的事情吗?我的编程技能还不错.我目前正在为iPhone创建一个应用程序.使用openframeworks.

Could someone please help me on how to do something like this? my programming skills are alright. Im currently creating an application for the iphone. with openframeworks.

任何帮助将不胜感激.干杯

Any help would be greatly appreciated. Cheers

推荐答案

首先,您必须将信号分成10到30ms的小帧,应用开窗功能(建议在声音应用中使用嗡嗡声),然后计算傅立叶信号转换.使用DFT,要计算梅尔频率倒谱系数,您必须执行以下步骤:

First, you have to split the signal in small frames with 10 to 30ms, apply a windowing function (humming is recommended for sound applications), and compute the fourier transform of the signal. With DFT, to compute Mel Frequecy Cepstral Coefficients you have to follow these steps:

  1. 获取功率谱:| DFT | ^ 2
  2. 计算三角波滤波器以将hz尺度转换为mel尺度
  3. 获取对数频谱
  4. 应用离散余弦变换

一个python代码示例:

A python code example:

import numpy
from scipy.fftpack import dct
from scipy.io import wavfile

sampleRate, signal = wavfile.read("file.wav")
numCoefficients = 13 # choose the sive of mfcc array
minHz = 0
maxHz = 22.000  

complexSpectrum = numpy.fft(signal)
powerSpectrum = abs(complexSpectrum) ** 2
filteredSpectrum = numpy.dot(powerSpectrum, melFilterBank())
logSpectrum = numpy.log(filteredSpectrum)
dctSpectrum = dct(logSpectrum, type=2)  # MFCC :)

def melFilterBank(blockSize):
    numBands = int(numCoefficients)
    maxMel = int(freqToMel(maxHz))
    minMel = int(freqToMel(minHz))

    # Create a matrix for triangular filters, one row per filter
    filterMatrix = numpy.zeros((numBands, blockSize))

    melRange = numpy.array(xrange(numBands + 2))

    melCenterFilters = melRange * (maxMel - minMel) / (numBands + 1) + minMel

    # each array index represent the center of each triangular filter
    aux = numpy.log(1 + 1000.0 / 700.0) / 1000.0
    aux = (numpy.exp(melCenterFilters * aux) - 1) / 22050
    aux = 0.5 + 700 * blockSize * aux
    aux = numpy.floor(aux)  # Arredonda pra baixo
    centerIndex = numpy.array(aux, int)  # Get int values

    for i in xrange(numBands):
        start, centre, end = centerIndex[i:i + 3]
        k1 = numpy.float32(centre - start)
        k2 = numpy.float32(end - centre)
        up = (numpy.array(xrange(start, centre)) - start) / k1
        down = (end - numpy.array(xrange(centre, end))) / k2

        filterMatrix[i][start:centre] = up
        filterMatrix[i][centre:end] = down

    return filterMatrix.transpose()

def freqToMel(freq):
    return 1127.01048 * math.log(1 + freq / 700.0)

def melToFreq(mel):
    return 700 * (math.exp(mel / 1127.01048) - 1)

此代码基于

This code is based on MFCC Vamp example. I hope this help you!

这篇关于如何从信号上的FFT获得MFCC?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆