将神经网络应用于可变长度语音段的MFCC [英] Applying neural network to MFCCs for variable-length speech segments
问题描述
我目前正在尝试创建和训练神经网络,以使用MFCC进行简单的语音分类.
I'm currently trying to create and train a neural network to perform simple speech classification using MFCCs.
此刻,我为每个样本使用26个系数,总共使用5个不同的类别-这些是五个不同的单词,其音节数量不同.
At the moment, I'm using 26 coefficients for each sample, and a total of 5 different classes - these are five different words with varying numbers of syllables.
虽然每个样本的时长为2秒,但我不确定如何处理用户可以非常缓慢或非常快速地发音的情况.例如,一秒钟内说出的电视"一词与两秒钟内说出的该词产生不同的系数.
While each sample is 2 seconds long, I am unsure how to handle cases where the user can pronounce words either very slowly or very quickly. E.g., the word 'television' spoken within 1 second yields different coefficients than the word spoken within two seconds.
任何有关如何解决此问题的建议将不胜感激!
Any advice on how I can solve this problem would be much appreciated!
推荐答案
我目前正在尝试创建和训练神经网络,以使用MFCC进行简单的语音分类.
I'm currently trying to create and train a neural network to perform simple speech classification using MFCCs.
简单的神经网络没有输入长度不变性,也不允许分析时间序列.
Simple neural networks do not have input lenght invariance and do not allow to analyze time series.
对于时间序列的分类(如一系列MFCC帧),可以使用具有时间不变性的分类器.例如,您可以使用结合了隐马尔可夫模型(ANN-HMM)的神经网络,结合了隐马尔可夫模型的高斯混合模型(GMM-HMM)或递归神经网络(RNN).用于RNN的Matlab实现可此处. Theano实现也可以可用.您可以在Google中找到这些结构的详细说明.
For classification of time series like a series of MFCC frames you can use a classifier with time invariance. For example you can use neural networks combined with hidden Markov models (ANN-HMM), gaussian mixture model with hidden markov models (GMM-HMM) or recurrent neural networks (RNN). Matlab implementation for RNN is here. Theano implementation is also available. You can find a detailed description of those structures in Google.
语音识别并不是一件容易的事情,最好使用现有的软件,例如 CMUSphinx
Speech recognition is not a simple thing to implement, it is better to use existing software like CMUSphinx
这篇关于将神经网络应用于可变长度语音段的MFCC的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!