使用带有scikit学习的高斯混合模型进行多类分类 [英] Multiclass classification using Gaussian Mixture Models with scikit learn

查看:485
本文介绍了使用带有scikit学习的高斯混合模型进行多类分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 sklearn.mixture.GaussianMixture 对高光谱图像中的像素进行分类.有15个课程(1-15).我尝试使用方法 http://scikit-learn.org/stable/auto_examples /mixture/plot_gmm_covariances.html .在这里,均值是用means_init初始化的,我也尝试过这样做,但是我的准确性很差(大约10%).我还尝试更改协方差的类型,阈值,最大迭代次数和初始化次数,但结果相同.
我做对了吗?请提供输入.

I am trying to use sklearn.mixture.GaussianMixture for classification of pixels in an hyper-spectral image. There are 15 classes (1-15). I tried using the method http://scikit-learn.org/stable/auto_examples/mixture/plot_gmm_covariances.html. In here the mean is initialize with means_init,I also tried this but my accuracy is poor (about 10%). I also tried to change type of covariance, threshold, maximum iterations and number of initialization but the results are same.
Am I doing correct? Please provide inputs.

import numpy as np
from sklearn.mixture import GaussianMixture
import scipy.io as sio
from sklearn.model_selection import train_test_split
uh_data =sio.loadmat('/Net/hico/data/users/nikhil/contest_uh_casi.mat')
data = uh_data['contest_uh_casi']

uh_labels = sio.loadmat('/Net/hico/data/users/nikhil/contest_gt_tr.mat')
labels = uh_labels['contest_gt_tr']

reshaped_data = np.reshape(data,(data.shape[0]*data.shape[1],data.shape[2]))
print 'reshaped data :',reshaped_data.shape

reshaped_label = np.reshape(labels,(labels.shape[0]*labels.shape[1],-1))
print 'reshaped label :',reshaped_label.shape

con_data = np.hstack((reshaped_data,reshaped_label))
pre_data = con_data[con_data[:,144] > 0]
total_data = pre_data[:,0:144]
total_label = pre_data[:,144]

train_data, test_data, train_label, test_label =  train_test_split(total_data, total_label, test_size=0.30, random_state=42)

classifier = GaussianMixture(n_components = 15 ,covariance_type='diag',max_iter=100,random_state = 42,tol=0.1,n_init = 1)

classifier.means_init = np.array([train_data[train_label == i].mean(axis=0) 
                                for i in range(1,16)]) 
classifier.fit(train_data)

pred_lab_train = classifier.predict(train_data)
train_accuracy = np.mean(pred_lab_train.ravel() == train_label.ravel())*100
print 'train accuracy:',train_accuracy

pred_lab_test = classifier.predict(test_data)
test_accuracy = np.mean(pred_lab_test.ravel()==test_label.ravel())*100
print 'test accuracy:',test_accuracy  

我的数据有66485像素,每个都有144个特征.在应用了某些特征缩减技术(例如PCA,LDA,KPCA等)之后,我也尝试过这样做,但是结果仍然相同.

My data has 66485 pixels and 144 features each. I also tried to do after applying some feature reduction techniques like PCA, LDA, KPCA etc, but the results are still the same.

推荐答案

高斯混合不是分类器.这是一种密度估算方法,并且期望其组件与您的班级神奇地对齐不是一个好主意.您应该尝试实际的受监管的技术,因为您显然可以访问标签. Scikit-learn提供了很多这样的选择,包括Random Forest,KNN,SVM,...选择您喜欢的. GMM只是尝试将高斯混合混合到您的数据中,但是没有任何强迫它根据标签进行放置(fit调用中甚至没有提供).有时,这会起作用-但仅适用于琐碎的问题,在这种情况下,类之间的间隔是如此之差,以至于朴素的贝叶斯都可以正常工作,但是总的来说,它只是解决问题的无效工具.

Gaussian Mixture is not a classifier. It is a density estimation method, and expecting that its components will magically align with your classes is not a good idea. You should try out actual supervised techniques, since you clearly do have access to labels. Scikit-learn offers lots of these, including Random Forest, KNN, SVM, ... pick your favourite. GMM simply tries to fit mixture of Gaussians into your data, but there is nothing forcing it to place them according to the labeling (which is not even provided in the fit call). From time to time this will work - but only for trivial problems, where classes are so well separated that even Naive Bayes would work, in general however it is simply invalid tool for the problem.

这篇关于使用带有scikit学习的高斯混合模型进行多类分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆