如何在naive_bayes MultinomialNB中计算feature_log_prob_ [英] How to calculate feature_log_prob_ in the naive_bayes MultinomialNB

查看：65 发布时间：2021/5/31 18:43:27 machine-learning scikit-learn naivebayes

本文介绍了如何在naive_bayes MultinomialNB中计算feature_log_prob_的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是我的代码:

 #加载库将numpy导入为np从sklearn.naive_bayes导入MultinomialNB从sklearn.feature_extraction.text导入CountVectorizer# 创建文本text_data = np.array(['蒂姆很聪明！'，快乐是最好的"，丽莎很傻"，弗雷德很懒"，丽莎很懒"])#创建目标向量y = np.array([1,1,0,0,0])#创建单词袋计数 = CountVectorizer()bag_of_words = count.fit_transform(text_data)##创建特征矩阵X = bag_of_words.toarray()mnb = MultinomialNB(alpha = 1，fit_prior = True，class_prior = None)mnb.fit(X，y)打印(count.get_feature_names())#输出:[最佳"，哑巴"，弗雷德"，是"，欢乐"，懒惰"，莉萨"，聪明"，那个"，蒂姆"]打印(mnb.feature_log_prob_)# 输出[[-2.94443898 -2.2512918 -2.2512918 -1.55814462 -2.94443898 -1.84582669-1.84582669 -2.94443898 -2.94443898 -2.94443898][-2.14006616 -2.83321334 -2.83321334 -1.73460106 -2.14006616 -2.83321334-2.83321334 -2.14006616 -2.14006616 -2.14006616]]

我的问题是:
让我们说一个词:最佳": class 1的概率:-2.14006616 .
得到该分数的公式是什么?

我正在使用 LOG(P(best | y = class = 1))->Log(1/2)->无法获取 -2.14006616

解决方案

来自

其中，分子大致对应于特征最佳"出现在训练集中的类 1 (在此示例中，我们感兴趣)的次数，而分母对应于类 1 的所有功能的总数.另外，我们添加一个小的平滑值 alpha 以防止概率变为零，并且 n 对应于特征的总数，即词汇量.计算这些数字作为我们的示例，

  N_yi = 1#最佳"在类"1"中仅出现一次N_y = 7#类'1'中有7个特征(全部单词数)alpha = 1#根据sklearn的默认值n = 10 # 词汇量Required_probability =(1 + 1)/(7 + 1 * 10)= 0.11764

您可以为任何给定的功能和类以类似的方式进行数学运算.

希望这会有所帮助！

Here's my code:

# Load libraries
import numpy as np
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
# Create text
text_data = np.array(['Tim is smart!',
                      'Joy is the best',
                      'Lisa is dumb',
                      'Fred is lazy',
                      'Lisa is lazy'])
# Create target vector
y = np.array([1,1,0,0,0])
# Create bag of words
count = CountVectorizer()
bag_of_words = count.fit_transform(text_data)    # 

# Create feature matrix
X = bag_of_words.toarray()

mnb = MultinomialNB(alpha = 1, fit_prior = True, class_prior = None)
mnb.fit(X,y)

print(count.get_feature_names())
# output:['best', 'dumb', 'fred', 'is', 'joy', 'lazy', 'lisa', 'smart', 'the', 'tim']


print(mnb.feature_log_prob_) 
# output 
[[-2.94443898 -2.2512918  -2.2512918  -1.55814462 -2.94443898 -1.84582669
  -1.84582669 -2.94443898 -2.94443898 -2.94443898]
 [-2.14006616 -2.83321334 -2.83321334 -1.73460106 -2.14006616 -2.83321334
  -2.83321334 -2.14006616 -2.14006616 -2.14006616]]

My question is:
Let's say for word: "best": the probability for class 1 : -2.14006616.
What is the formula to calculate to get this score.

I am using LOG (P(best|y=class=1)) -> Log(1/2) -> can't get the -2.14006616

解决方案

From the documentation we can infer that feature_log_prob_ corresponds to the empirical log probability of features given a class. Let's take an example feature "best" for the purpose of this illustration, the log probability of this feature for class 1 is -2.14006616 (as you pointed out), now if we were to convert it into actual probability score it will be np.exp(1)**-2.14006616 = 0.11764. Let's take one more step back to see how and why the probability of "best" in class 1 is 0.11764. As per the documentation of Multinomial Naive Bayes, we see that these probabilities are computed using the formula below:

Where, the numerator roughly corresponds to the number of times feature "best" appears in the class 1 (which is of our interest in this example) in the training set, and the denominator corresponds to the total count of all features for class 1. Also, we add a small smoothing value, alpha to prevent from the probabilities going to zero and n corresponds to the total number of features i.e. size of vocabulary. Computing these numbers for the example we have,

N_yi = 1  # "best" appears only once in class `1`
N_y = 7   # There are total 7 features (count of all words) in class `1`
alpha = 1 # default value as per sklearn
n = 10    # size of vocabulary

Required_probability = (1+1)/(7+1*10) = 0.11764

You can do the math in a similar fashion for any given feature and class.

Hope this helps!

这篇关于如何在naive_bayes MultinomialNB中计算feature_log_prob_的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在naive_bayes MultinomialNB中计算feature_log_prob_ [英] How to calculate feature_log_prob_ in the naive_bayes MultinomialNB

问题描述

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

如何在naive_bayes MultinomialNB中计算feature_log_prob_ [英] How to calculate feature_log_prob_ in the naive_bayes MultinomialNB

问题描述

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭