使用MFCC的简单单词检测器 [英] Simple word detector using MFCC
问题描述
我正在使用梅尔频率倒谱系数实现用于语音识别的软件.特别是系统必须识别单个指定的单词.因为有音频文件,所以我将MFCC放在一个矩阵中,该矩阵具有12行(MFCC)和与语音帧数一样多的列.我对行进行平均,因此得到的矢量只有12行(第i行是所有帧中所有ith-MFCC的平均值).我的问题是如何训练分类器来检测单词?我有一个仅包含正样本的训练集,即我从多个音频文件(同一个单词的多个注册)中获得的MFCC.
I am implementing a software for speech recognition using Mel Frequency Cepstrum Coefficients. In particular the system must recognize a single specified word. Since the audio file I get the MFCCs in a matrix with 12 rows(the MFCCs) and as many columns as the number of voice frames. I make the average of the rows, so I get a vector with only the 12 rows (the ith-row is the average of all ith-MFCCs of all frames). My question is how to train a classifier to detect the word? I have a training set with only positive samples, the MFCCs that i get from several audio file (several registration of the same word).
推荐答案
我对行进行平均,因此我得到一个只有12行的向量(第i行是所有帧中所有ith-MFCC的平均值).
I make the average of the rows, so I get a vector with only the 12 rows (the ith-row is the average of all ith-MFCCs of all frames).
这是一个非常糟糕的主意,因为您会丢失有关该单词的所有信息,需要分析整个mfcc序列,而不是其中的一部分
This is a very bad idea because you lose all information about the word, you need to analyze the whole mfcc sequence, not a part of it
我的问题是如何训练分类器来检测单词?
My question is how to train a classifier to detect the word?
简单的形式就是GMM分类器,您可以在此处查看:
The simple form would be a GMM classifier, you can check here:
以更复杂的形式,您需要学习更复杂的模型,例如HMM.您可以从像这样的教科书中了解有关HMM的更多信息
In more complex form you need to learn more complex model like HMM. You can learn more about HMM from textbook like this one
http://www.amazon.com/Fundamentals -Speech-Recognition-Lawrence-Rabiner/dp/0130151572
这篇关于使用MFCC的简单单词检测器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!