朴素贝叶斯:训练中每个功能的类内差异必须为正 [英] Naive Bayes: the within-class variance in each feature of TRAINING must be positive

查看:175
本文介绍了朴素贝叶斯:训练中每个功能的类内差异必须为正的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当尝试适应朴素贝叶斯时:

When trying to fit Naive Bayes:

    training_data = sample; % 
    target_class = K8;
 # train model
 nb = NaiveBayes.fit(training_data, target_class);

 # prediction
 y = nb.predict(cluster3);

我得到一个错误:

??? Error using ==> NaiveBayes.fit>gaussianFit at 535
The within-class variance in each feature of TRAINING
must be positive. The within-class variance in feature
2 5 6 in class normal. are not positive.

Error in ==> NaiveBayes.fit at 498
            obj = gaussianFit(obj, training, gindex);

有人可以阐明这一点以及如何解决吗?请注意,我已经在此处阅读过类似的帖子,但是我不确定该怎么做做?看起来好像其尝试基于列而不是行进行拟合,类别方差应该基于每一行属于特定类别的概率.如果我删除这些列,则可以使用,但是显然这不是我想要的.

Can anyone shed light on this and how to solve it? Note that I have read a similar post here but I am not sure what to do? It seems as if its trying to fit based on columns rather than rows, the class variance should be based on the probability of each row belonging to a specific class. If I delete those columns then it works but obviously this isnt what I want to do.

推荐答案

假定您的代码(或mathworks的NaiveBayes代码)中的任何地方都没有错误,并再次假设您的training_data的格式为NxD N个观测值和D个特征,则对于至少一个类别,第2、5和6列完全为零.如果您的培训数据相对较少且课程数量很多,则可能会发生这种情况,其中单个课程可能由一些观察结果表示.由于NaiveBayes默认情况下将所有功能都视为正态分布的一部分,因此它不能与对与单个类相关的所有功能的方差为零的列一起使用.换句话说,NaiveBayes无法通过使正态分布适合特定类的特征来找到概率分布的参数(请注意:分布的默认值为normal).

Assuming that there is no bug anywhere in your code (or NaiveBayes code from mathworks), and again assuming that your training_data is in the form of NxD where there are N observations and D features, then columns 2, 5, and 6 are completely zero for at least a single class. This can happen if you have relatively small training data and high number of classes, in which a single class may be represented by a few observations. Since NaiveBayes by default treats all features as part of a normal distribution, it cannot work with a column that has zero variance for all features related to a single class. In other words, there is no way for NaiveBayes to find the parameters of the probability distribution by fitting a normal distribution to the features of that specific class (note: the default for distribution is normal).

看看功能的性质.如果它们似乎不遵循每个类中的正态分布,则normal不是您要使用的选项.也许您的数据更接近多项式模型mn:

Take a look at the nature of your features. If they seem to not follow a normal distribution within each class, then normal is not the option you want to use. Maybe your data is closer to a multinomial model mn:

nb = NaiveBayes.fit(training_data, target_class, 'Distribution', 'mn');

这篇关于朴素贝叶斯:训练中每个功能的类内差异必须为正的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆