朴素的分类器Matlab [英] naive classifier matlab

查看:82
本文介绍了朴素的分类器Matlab的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在matlab中测试朴素分类器时,即使我对相同的样本数据进行了训练和测试,我仍然得到了不同的结果,我想知道我的代码是否正确,是否有人可以帮助解释为什么呢?

When testing the naive classifier in matlab I get different results even though I trained and tested on the same sample data, I was wondering if my code is correct and if someone could help explain why this is?

%% dimensionality reduction 
columns = 6
[U,S,V]=svds(fulldata,columns);

%% randomly select dataset
rows = 1000;
columns = 6;

%# pick random rows
indX = randperm( size(fulldata,1) );
indX = indX(1:rows)';

%# pick random columns
%indY = randperm( size(fulldata,2) );
indY = indY(1:columns);

%# filter data
data = U(indX,indY);

%% apply normalization method to every cell
data = zscore(data);

%create a training set the same as datasample
training_data = data;

%match the class labels to the corresponding rows
target_class = classlabels(indX,:)

%classify the same data sample to check if naive bayes works
class  = classify(data, training_data, target_class, 'diaglinear')
confusionmat(test_class, class)

这里是一个例子:

请注意,ipsweep,泪滴和后退与正常流量混合在一起.我还没有进入对看不见的数据进行分类的阶段,但是我只是想测试一下是否可以对相同的数据进行分类.

Notice it got ipsweep, teardrop and back mixed up with normal traffic. I haven't gotten to the stage of classifying unseen data yet I just wanted to test if it would classify the same data.

混淆矩阵输出:

ans =

   537     0     0     0     0     0     0     1     0
     0   224     0     0     0     1     0     1     0
     0     0    91    79     0    17    24     4     0
     0     0     0     8     0     0     2     0     0
     0     0     0     0     3     0     0     0     0
     0     0     0     0     0     1     0     0     0
     0     0     0     0     0     0     2     0     0
     0     0     0     0     0     0     0     3     0
     0     0     0     0     0     1     0     0     1

尽管我不知道这到底是什么,我可能在代码中弄错了,但是我认为我只是测试看看它输出了什么.

Although I have no clue what this actually is and I probably got this wrong in my code but I thought I would just test to see what it outputs.

推荐答案

您正在对降维的数据使用分类器.分类器的含义是不精确的,因为它需要泛化.在降维阶段,您正在丢失信息,这也导致分类性能下降.

You are using a classifier on data of reduced dimensionality. A classifier is meant to be slightly imprecise because it needs to generalize. In the dimensionality reduction stage you are loosing information which also leads to reduced classification performance.

即使在训练过程中也不要期望完美的表现,这可能是过度拟合的糟糕情况.

Don't expect perfect performance even on the training set, this would be a bad case of over-fitting.

关于混淆矩阵的使用. C(3,4)=79的含义不外于对于79个数据点,该类别应为3,并且已将其分类为类别4.完整的矩阵表明,您的分类器对类别1和2表现良好,但对类别3却存在问题.类几乎没有数据,因此很难判断分类器对它们的效果如何.

As for the use of the confusion matrix. C(3,4)=79 means nothing more than that for 79 data points the class should be 3 and they got classified as class 4. The complete matrix says that your classifier works well for classes 1 and 2 but has problems with class 3. The rest of the classes have almost no data so it is difficult to judge how good the classifier works for them.

这篇关于朴素的分类器Matlab的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆