聚类和贝叶斯分类器Matlab [英] Clustering and Bayes classifiers Matlab

查看:326
本文介绍了聚类和贝叶斯分类器Matlab的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我下一步要做的是走十字路口,我着手学习一些机器学习算法并将其应用于复杂的数据集,而现在我已经做到了.我的计划从一开始就是将两个可能的分类器组合在一起,以尝试创建一个多分类系统.

So I am at a cross roads on what to do next, I set out to learn and apply some machine learning algorithms on a complicated dataset and I have now done this. My plan from the very beginning was to combine two possible classifiers in an attempt to make a multi-classification system.

但是这里就是我被困住的地方.我选择了聚类算法(模糊C均值)(在学习了一些样本K均值资料之后)和朴素贝叶斯作为MCS(多分类器系统)的两个候选对象.

But here is where I am stuck. I choose a clustering algorithm (Fuzzy C Means) (after learning some sample K-means stuff) and Naive Bayes as the two candidates for the MCS (Multi-Classifier System).

我可以单独使用这两种方法对数据进行分类,但是我正在努力以有意义的方式将两者结合起来.

I can use both independently to classify the data but I am struggling to combine the two in a meaningful way.

例如,模糊聚类捕获了几乎所有的蓝精灵"攻击,除了通常一个,而且我不确定为什么它不能捕获此奇数球,但是我所知道的是它没有.其中一个集群会受到蓝精灵攻击的控制,通常我会在其他集群中找到一个蓝精灵.如果我在所有不同的攻击类型(蓝精灵,普通,海王星等)上训练贝叶斯分类器,并将其应用于其余集群,以尝试找到最后一种情况,这就是我遇到问题的地方剩余的蓝精灵将具有较高的误报率.

For instance the Fuzzy clustering catches almost all "Smurf" attacks except for usually one and I am not sure why it doesnt catch this odd ball but all I know is it doesnt. One of the clusters will be dominated by the smurf attacks and usualy I will find just one smurf in the other clusters. And here is where I run into the problem scenario, if I train the bayes classifier on all the different attack types (Smurf, normal, neptune... etc) and apply that to the remainder of the clusters in an attempt to find that last remaining smurf it will have a high false alarm rate.

我不确定如何进行,我不想从训练集中删除其他攻击,但我只想训练贝叶斯分类器来发现蓝精灵"攻击.目前,它经过训练可以尝试发现所有问题,在这个过程中,我认为(不确定)准确性下降了.

I'm not sure how to proceed, I dont want to take the other attacks out of the training set but I only want to train the bayes classifier to spot "Smurf" attacks. At the moment it is trained to try and spot everything, and in this process I think (not sure) that the accuracy is dropped.

所以这是我使用朴素贝叶斯分类器时的问题,如何仅查找蓝精灵并将其他所有内容归类为其他"呢?

So this is my question when using the naive bayes classifier, how would you get it to only look for smurf and categorise everything else as "Other".

 rows = 1000;
 columns = 6;

 indX = randperm( size(fulldata,1) );
 indX = indX(1:rows)';

 data = fulldata(indX, indY)

 indX1 = randperm( size(fulldata,1) );
 indX1 = indX1(1:rows)';


%% apply normalization method to every cell
%data = zscore(data);

training_data = data;
target_class = labels(indX,:)

class  = classify(test_data,training_data, target_class, 'diaglinear')
confusionmat(target_class,class)

我当时想的是手动将target_class从所有正常流量和攻击迁移到 other .然后,我已经知道FCM可以正确分类除一种蓝精灵攻击以外的所有攻击,因此我只需要在其余集群上使用朴素贝叶斯分类器即可.

What I was thinking was manually changing target_class from all the normal traffic and attacks that arent smurf to other. Then as I already know that FCM correctly classifies all but one smurf attack, I just have to use the naive bayes classifier on the remaining clusters.

例如:

集群1 = 500个蓝精灵攻击(重复此步骤可能会将大部分"蓝精灵攻击从1000个样本转移到另一个群集中,因此我必须检查或遍历最大的群集大小,一旦找到,就可以从朴素贝叶斯分类器阶段将其删除)

Cluster 1 = 500 smurf attacks (repeating this step might shift the "majority" of smurf attacks from the 1000 samples into a different cluster so I have to check or iterate through the clusters for the biggest size, once found I can remove it from the naive bayes classifier stage)

然后我在剩余的每个群集上测试分类器(不确定在matlab中如何执行循环等),因此目前我必须在处理期间手动选择它们.

Then I test the classifier on each remaining cluster (not sure how to do loops etc yet in matlab) so at the moment I have to manually pick them during the processing.

    clusters = 4;
    CM = colormap(jet(clusters));
    options(1) = 12.0;
    options(2) = 1000;
    options(3) = 1e-10;
    options(4) = 0;
  [~,y] = max(U);
  [centers, U, objFun] = fcm(data, clusters, options); % cluster 1000 sample data rows

training_data = newTrainingData(indX1,indY); % this is the numeric data 
test_data = fulldata(indX(y==2),:); % this is cluster 2 from the FCM phase which will be classified. 
test_class = labels(indX(y==2),:); % thanks to amro this helps the confusion matrix give an unbiased error detection rate in the confusion matrix. 
 target_class = labels(indX,:) % this is labels for the training_data, it only contains the smurf attacks while everything else is classed as other 

 class  = classify(test_data,training_data, target_class, 'diaglinear')
 confusionmat(test_class,class)

然后,我对其余的每个集群重复进行贝叶斯分类器,寻找那一次蓝精灵攻击.

I then repeat the bayes classifier for each of the remaining clusters, looking for that one smurf attack.

我的问题是,如果将其他"攻击错误分类为蓝精灵或找不到剩余的蓝精灵,将会发生什么.

My problem is what happens if it misclassifies an "other" attack as a smurf or doesn't find the one remaining smurf.

我觉得更好的方式迷失了方向.我正在尝试选择蓝精灵攻击与其他"攻击的良好比例,因为我不想过度拟合,这在先前的问题

I feel kind of lost on a better way of doing it. I am in the process of trying to pick a good ratio of smurf attacks to "other" as I dont want to over-fit which was explained in a previous question here.

但是这将花费我一些时间,因为我还不知道如何将现有的标签从海王星,海王星,ipsweep,wareclient攻击更改/替换为matlab中的其他"标签,因此我尚无法对此理论进行检验(将到达那里).

But this will take me some time as I dont yet know how to change/replace the existing labels from neptune, back, ipsweep, wareclient attacks to "other" in matlab so I can't yet test this theory out (will get there).

所以我的问题是:

1)是否有更好的方法来发现一次难以捉摸的蓝精灵攻击.

1) Is there a better method at finding that one elusive smurf attack.

2)我该如何grep target_class(标签),以用" other "

2) How can I grep the target_class (labels) to replace everything that isn't smurf with "other"

推荐答案

我将尝试部分回答您的问题.

I will try to partly answer your questions.

1)是否有更好的方法来发现一次难以捉摸的蓝精灵攻击.

1) Is there a better method at finding that one elusive smurf attack.

我建议您不要尝试此操作. 500分之一.这显然是过度拟合数据的情况.您的分类器不能很好地概括测试数据.

I suggest you to not try this. 1 in 500. This is almost clearly a case of over fitting your data. Your classifier will not generalize well to test data.

2)我该如何grep target_class(标签),以用其他"替换所有不是蓝精灵的东西

2) How can I grep the target_class (labels) to replace everything that isn't smurf with "other"

为此,请尝试使用matlab代码.

For this try following matlab code.

clear all;
close all;
load fisheriris
IndexOfVirginica = strcmp (species, 'virginica');
IndexOfNotVirginica = IndexOfVirginica ==0;
otherSpecies = species;
otherSpecies(IndexOfNotVirginica) = {'other'};
otherSpecies

这篇关于聚类和贝叶斯分类器Matlab的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆