每次使用GMM分类器运行都会得到不同的结果 [英] Having different results every run with GMM Classifier

查看:351
本文介绍了每次使用GMM分类器运行都会得到不同的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在做一个语音识别和机器学习相关的项目. 我现在有两个班级,并且为每个班级创建两个GMM分类器,分别用于标签快乐"和悲伤"

I'm currently doing a speech recognition and machine learning related project. I have two classes now, and I create two GMM classifiers for each class, for labels 'happy' and 'sad'

我想用MFCC向量训练GMM分类器.

I want to train GMM classifiers with MFCC vectors.

我为每个标签使用两个GMM分类器. (以前是每个文件GMM):

I am using two GMM classifiers for each label. (Previously it was GMM per file):

但是,每次我运行脚本时,都会得到不同的结果. 用相同的测试样本和训练样本可能是什么原因?

But every time I run the script I am having different results. What might be the cause for that with same test and train samples?

在下面的输出中,请注意,我有10个测试样本, 每行对应有序测试样品的结果

In the outputs below please note that I have 10 test samples and each line corresponds the results of the ordered test samples

代码:

classifiers = {'happy':[],'sad':[]}
probability = {'happy':0,'sad':0}

def createGMMClassifiers():
    for name, data in training.iteritems():
        #For every class: In our case it is two, happy and sad
        classifier = mixture.GMM(n_components = n_classes,n_iter=50)
        #two classifiers.
        for mfcc in data:
            classifier.fit(mfcc)
        addClassifier(name, classifier)
    for testData in testing['happy']:
        classify(testData)

def addClassifier(name,classifier):
    classifiers[name]=classifier

def classify(testMFCC):
    for name, classifier in classifiers.iteritems():
        prediction = classifier.predict_proba(testMFCC)
        for f, s in prediction:
            probability[name]+=f
    print 'happy ',probability['happy'],'sad ',probability['sad']

示例输出1:

happy  154.300420496 sad  152.808941585
happy
happy  321.17737915 sad  318.621788517
happy
happy  465.294473363 sad  461.609246112
happy
happy  647.771003768 sad  640.451097035
happy
happy  792.420461416 sad  778.709674995
happy
happy  976.09526992 sad  961.337361541
happy
happy  1137.83592093 sad  1121.34722203
happy
happy  1297.14692405 sad  1278.51011583
happy
happy  1447.26926553 sad  1425.74595666
happy
happy  1593.00403707 sad  1569.85670672
happy

示例输出2:

happy  51.699579504 sad  152.808941585
sad
happy  81.8226208497 sad  318.621788517
sad
happy  134.705526637 sad  461.609246112
sad
happy  167.228996232 sad  640.451097035
sad
happy  219.579538584 sad  778.709674995
sad
happy  248.90473008 sad  961.337361541
sad
happy  301.164079068 sad  1121.34722203
sad
happy  334.853075952 sad  1278.51011583
sad
happy  378.730734469 sad  1425.74595666
sad
happy  443.995962929 sad  1569.85670672
sad

推荐答案

但是,每次我运行脚本时,都会得到不同的结果.用相同的测试样本和训练样本可能是什么原因?

But every time I run the script I am having different results. What might be the cause for that with same test and train samples?

scikit-learn使用随机初始化程序.如果您想要可重现的结果,则可以设置 random_state参数

scikit-learn uses random initializer. If you want reproducable results, you can set random_state argument

random_state: RandomState or an int seed (None by default)

对于名称,training.iteritems()中的数据:

for name, data in training.iteritems():

这是不正确的,因为您仅在最后一个样本上进行训练.在运行拟合之前,需要将每个标签的特征连接到单个数组中.您可以使用np.concatenate.

This is not correct since you train only on a last sample. You need to concatenate features per label into a single array before you run fit. You can use np.concatenate for that.

这篇关于每次使用GMM分类器运行都会得到不同的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆