如何合并多个朴素贝叶斯分类器的输出? [英] How to combine the outputs of multiple naive bayes classifier?

查看:289
本文介绍了如何合并多个朴素贝叶斯分类器的输出?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是这个新手.

我有一组使用Sklearn工具箱中的朴素贝叶斯分类器(NBC)构造的弱分类器.

I have a set of weak classifiers constructed using Naive Bayes Classifier (NBC) in Sklearn toolkit.

我的问题是我如何结合每个NBC的输出来做出最终决定.我希望我的决定是概率而不是标签.

My problem is how do I combine the output of each of the NBC to make final decision. I want my decision to be in probabilities and not labels.

我用python编写了以下程序.我从Sklean的虹膜数据集中假设2类问题.对于演示/学习,请说我按照以下方法制作4个NBC.

I made a the following program in python. I assume 2 class problem from iris-dataset in sklean. For demo/learning say I make a 4 NBC as follows.

from sklearn import datasets
from sklearn.naive_bayes import GaussianNB

import numpy as np
import cPickle
import math

iris = datasets.load_iris()

gnb1 = GaussianNB()
gnb2 = GaussianNB()
gnb3 = GaussianNB()
gnb4 = GaussianNB()

#Actual dataset is of 3 class I just made it into 2 class for this demo
target = np.where(iris.target, 2, 1)

gnb1.fit(iris.data[:, 0].reshape(150,1), target)
gnb2.fit(iris.data[:, 1].reshape(150,1), target)
gnb3.fit(iris.data[:, 2].reshape(150,1), target)
gnb4.fit(iris.data[:, 3].reshape(150,1), target)

#y_pred = gnb.predict(iris.data)
index = 0
y_prob1 = gnb1.predict_proba(iris.data[index,0].reshape(1,1))
y_prob2 = gnb2.predict_proba(iris.data[index,1].reshape(1,1))
y_prob3 = gnb3.predict_proba(iris.data[index,2].reshape(1,1))
y_prob4 = gnb4.predict_proba(iris.data[index,3].reshape(1,1))

#print y_prob1, "\n", y_prob2, "\n", y_prob3, "\n", y_prob4 

 # I just added it over all for each class
pos = y_prob1[:,1] + y_prob2[:,1] + y_prob3[:,1] + y_prob4[:,1]
neg = y_prob1[:,0] + y_prob2[:,0] + y_prob3[:,0] + y_prob4[:,0]

print pos
print neg

您会注意到,我只是简单地添加了每个NBC的概率作为最终得分.我想知道这是否正确吗?

As you will notice I just simply added the probabilites of each of NBC as final score. I wonder if this correct?

如果我的理解没有错,可以请您提出一些建议,以便我纠正自己.

If I have dont it wrong can you please suggest some ideas so I can correct myself.

推荐答案

首先-为什么这样做?您应该在这里一个朴素贝叶斯,而不是每个功能一个 .您似乎不了解分类器的概念.您所做的实际上是Naive Bayes内部所做的事情-它独立对待每个功能,但是由于这些是概率,因此您应该相乘添加对数,因此:

First of all - why you do this? You should have one Naive Bayes here, not one per feature. It looks like you do not understand the idea of the classifier. What you did is actually what Naive Bayes is doing internally - it treats each feature independently, but as these are probabilities you should multiply them, or add logarithms, so:

  1. 您应该只有一个NB,gnb.fit(iris.data, target)
  2. 如果您坚持要拥有多个NB,则应通过对数的乘法或加法来合并它们(从数学角度来看,这是相同的,但是乘法在数值上较不稳定)

  1. You should just have one NB, gnb.fit(iris.data, target)
  2. If you insist on having many NBs, you should merge them through multiplication or addition of logarithms (which is the same from mathematical perspective, but multiplication is less stable in the numerical sense)

pos = y_prob1[:,1] * y_prob2[:,1] * y_prob3[:,1] * y_prob4[:,1]

pos = np.exp(np.log(y_prob1[:,1]) + np.log(y_prob2[:,1]) + np.log(y_prob3[:,1]) + np.log(y_prob4[:,1]))

您也可以直接通过gnb.predict_log_proba而不是gbn.predict_proba预先对数.

you can also directly predit logarithm through gnb.predict_log_proba instead of gbn.predict_proba.

但是,这种方法有一个错误-朴素贝叶斯也将在每个概率中都包含pre,因此您的分布将非常不对称.因此,您必须手动进行归一化

However, this approach have one error - Naive Bayes will also include prior in each of your prob's, so you will have very skewed distributions. So you have to manually normalize

pos_prior = gnb1.class_prior_[1]#所有模型的优先级都相同,因此我们可以使用gnb1中的一个模型

pos_prior = gnb1.class_prior_[1] # all models have the same prior so we can use the one from gnb1

pos = pos_prior_ * (y_prob1[:,1]/pos_prior_) * (y_prob2[:,1]/pos_prior_) * (y_prob3[:,1]/pos_prior_) * (y_prob4[:,1]/pos_prior_)

简化为

pos = y_prob1[:,1] * y_prob2[:,1] * y_prob3[:,1] * y_prob4[:,1] / pos_prior_**3

并登录到

pos = ... - 3 * np.log(pos_prior_)

因此,再次-您应该使用"1"选项.

So once again - you should use the "1" option.

这篇关于如何合并多个朴素贝叶斯分类器的输出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆