如何指定scikit-learn的朴素贝叶斯的先验概率 [英] How to specify the prior probability for scikit-learn's Naive Bayes
问题描述
我正在使用scikit-learn机器学习库(Python)进行机器学习项目.我使用的算法之一是高斯朴素贝叶斯实现. GaussianNB()函数的属性之一如下:
I'm using the scikit-learn machine learning library (Python) for a machine learning project. One of the algorithms I'm using is the Gaussian Naive Bayes implementation. One of the attributes of the GaussianNB() function is the following:
class_prior_ : array, shape (n_classes,)
我想事先手动更改类,因为我使用的数据非常不正确,并且召回其中一个类非常重要.通过为该班级分配较高的先验概率,召回率应会增加.
I want to alter the class prior manually since the data I use is very skewed and the recall of one of the classes is very important. By assigning a high prior probability to that class the recall should increase.
但是,我不知道如何正确设置属性.我已经阅读了以下主题,但他们的答案对我不起作用.
However, I can't figure out how to set the attribute correctly. I've read the below topics already but their answers don't work for me.
如何在scikit-learn中为Naive Bayes clf手动设置先验概率?
我怎么知道我先给sci-kit学习什么? (朴素贝叶斯分类器.)
这是我的代码:
gnb = GaussianNB()
gnb.class_prior_ = [0.1, 0.9]
gnb.fit(data.XTrain, yTrain)
yPredicted = gnb.predict(data.XTest)
我认为这是正确的语法,我可以通过使用这些值来找出哪个类属于数组中的哪个位置,但结果保持不变.也没有给出错误.
I figured this was the correct syntax and I could find out which class belongs to which place in the array by playing with the values but the results remain unchanged. Also no errors were given.
从scikit-learn库设置 GaussianNB 算法的属性的正确方法是什么?
What is the correct way of setting the attributes of the GaussianNB algorithm from scikit-learn library?
推荐答案
在scikit-learn中实现的GaussianNB()不允许您事先设置类.如果您阅读在线文档,则会看到.class_prior_是一个属性而不是参数.一旦安装了GaussianNB(),就可以访问class_prior_属性.它是通过简单地计算训练样本中不同标签的数量来计算的.
The GaussianNB() implemented in scikit-learn does not allow you to set class prior. If you read the online documentation, you see .class_prior_ is an attribute rather than parameters. Once you fit the GaussianNB(), you can get access to class_prior_ attribute. It is calculated by simply counting the number of different labels in your training sample.
from sklearn.datasets import make_classification
from sklearn.naive_bayes import GaussianNB
# simulate data with unbalanced weights
X, y = make_classification(n_samples=1000, weights=[0.1, 0.9])
# your GNB estimator
gnb = GaussianNB()
gnb.fit(X, y)
gnb.class_prior_
Out[168]: array([ 0.105, 0.895])
gnb.get_params()
Out[169]: {}
您会看到估算器足够聪明,可以考虑重量不平衡问题.因此,您不必手动指定先验条件.
You see the estimator is smart enough to take into account the unbalanced weight issue. So you don't have to manually specify the priors.
这篇关于如何指定scikit-learn的朴素贝叶斯的先验概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!