如何指定 scikit-learn 的朴素贝叶斯的先验概率 [英] How to specify the prior probability for scikit-learn's Naive Bayes

查看:56
本文介绍了如何指定 scikit-learn 的朴素贝叶斯的先验概率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将 scikit-learn 机器学习库 (Python) 用于机器学习项目.我使用的算法之一是高斯朴素贝叶斯实现.GaussianNB() 函数的属性之一如下:

I'm using the scikit-learn machine learning library (Python) for a machine learning project. One of the algorithms I'm using is the Gaussian Naive Bayes implementation. One of the attributes of the GaussianNB() function is the following:

class_prior_ : array, shape (n_classes,)

我想手动更改类,因为我使用的数据非常倾斜,并且其中一个类的召回非常重要.通过为该类别分配高先验概率,召回率应该会增加.

I want to alter the class prior manually since the data I use is very skewed and the recall of one of the classes is very important. By assigning a high prior probability to that class the recall should increase.

但是,我不知道如何正确设置属性.我已经阅读了以下主题,但他们的回答对我不起作用.

However, I can't figure out how to set the attribute correctly. I've read the below topics already but their answers don't work for me.

如何在 scikit-learn 中为朴素贝叶斯 clf 手动设置先验概率?

我怎么知道我为 sci-kit 学习提供了什么先验知识?(朴素贝叶斯分类器.)

这是我的代码:

gnb = GaussianNB()
gnb.class_prior_ = [0.1, 0.9]
gnb.fit(data.XTrain, yTrain)
yPredicted = gnb.predict(data.XTest)

我认为这是正确的语法,我可以通过使用这些值找出哪个类属于数组中的哪个位置,但结果保持不变.也没有给出错误.

I figured this was the correct syntax and I could find out which class belongs to which place in the array by playing with the values but the results remain unchanged. Also no errors were given.

从 scikit-learn 库中设置 GaussianNB 算法属性的正确方法是什么?

What is the correct way of setting the attributes of the GaussianNB algorithm from scikit-learn library?

链接到 scikit 文档高斯NB

推荐答案

在 scikit-learn 中实现的 GaussianNB() 不允许您预先设置类.如果您阅读在线文档,您会看到 .class_prior_ 是一个属性,而不是参数.拟合 GaussianNB() 后,您就可以访问 class_prior_ 属性.它的计算方法是简单地计算训练样本中不同标签的数量.

The GaussianNB() implemented in scikit-learn does not allow you to set class prior. If you read the online documentation, you see .class_prior_ is an attribute rather than parameters. Once you fit the GaussianNB(), you can get access to class_prior_ attribute. It is calculated by simply counting the number of different labels in your training sample.

from sklearn.datasets import make_classification
from sklearn.naive_bayes import GaussianNB


# simulate data with unbalanced weights
X, y = make_classification(n_samples=1000, weights=[0.1, 0.9])
# your GNB estimator
gnb = GaussianNB()
gnb.fit(X, y)

gnb.class_prior_
Out[168]: array([ 0.105,  0.895])

gnb.get_params()
Out[169]: {}

您会看到估算器足够聪明,可以考虑不平衡的重量问题.所以你不必手动指定先验.

You see the estimator is smart enough to take into account the unbalanced weight issue. So you don't have to manually specify the priors.

这篇关于如何指定 scikit-learn 的朴素贝叶斯的先验概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆