如何指定scikit-learn的朴素贝叶斯的先验概率 [英] How to specify the prior probability for scikit-learn's Naive Bayes

查看:453
本文介绍了如何指定scikit-learn的朴素贝叶斯的先验概率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用scikit-learn机器学习库(Python)进行机器学习项目.我使用的算法之一是高斯朴素贝叶斯实现. GaussianNB()函数的属性之一如下:

I'm using the scikit-learn machine learning library (Python) for a machine learning project. One of the algorithms I'm using is the Gaussian Naive Bayes implementation. One of the attributes of the GaussianNB() function is the following:

class_prior_ : array, shape (n_classes,)

我想事先手动更改类,因为我使用的数据非常不正确,并且召回其中一个类非常重要.通过为该班级分配较高的先验概率,召回率应会增加.

I want to alter the class prior manually since the data I use is very skewed and the recall of one of the classes is very important. By assigning a high prior probability to that class the recall should increase.

但是,我不知道如何正确设置属性.我已经阅读了以下主题,但他们的答案对我不起作用.

However, I can't figure out how to set the attribute correctly. I've read the below topics already but their answers don't work for me.

如何在scikit-learn中为Naive Bayes clf手动设置先验概率?

我怎么知道我先给sci-kit学习什么? (朴素贝叶斯分类器.)

这是我的代码:

gnb = GaussianNB()
gnb.class_prior_ = [0.1, 0.9]
gnb.fit(data.XTrain, yTrain)
yPredicted = gnb.predict(data.XTest)

我认为这是正确的语法,我可以通过使用这些值来找出哪个类属于数组中的哪个位置,但结果保持不变.也没有给出错误.

I figured this was the correct syntax and I could find out which class belongs to which place in the array by playing with the values but the results remain unchanged. Also no errors were given.

从scikit-learn库设置 GaussianNB 算法的属性的正确方法是什么?

What is the correct way of setting the attributes of the GaussianNB algorithm from scikit-learn library?

链接至以下内容的scikit文档高斯NB

推荐答案

在scikit-learn中实现的GaussianNB()不允许您事先设置类.如果您阅读在线文档,则会看到.class_prior_是一个属性而不是参数.一旦安装了GaussianNB(),就可以访问class_prior_属性.它是通过简单地计算训练样本中不同标签的数量来计算的.

The GaussianNB() implemented in scikit-learn does not allow you to set class prior. If you read the online documentation, you see .class_prior_ is an attribute rather than parameters. Once you fit the GaussianNB(), you can get access to class_prior_ attribute. It is calculated by simply counting the number of different labels in your training sample.

from sklearn.datasets import make_classification
from sklearn.naive_bayes import GaussianNB


# simulate data with unbalanced weights
X, y = make_classification(n_samples=1000, weights=[0.1, 0.9])
# your GNB estimator
gnb = GaussianNB()
gnb.fit(X, y)

gnb.class_prior_
Out[168]: array([ 0.105,  0.895])

gnb.get_params()
Out[169]: {}

您会看到估算器足够聪明,可以考虑重量不平衡问题.因此,您不必手动指定先验条件.

You see the estimator is smart enough to take into account the unbalanced weight issue. So you don't have to manually specify the priors.

这篇关于如何指定scikit-learn的朴素贝叶斯的先验概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆