在 Scikit Learn 中控制逻辑回归中的阈值 [英] Controlling the threshold in Logistic Regression in Scikit Learn

查看:141
本文介绍了在 Scikit Learn 中控制逻辑回归中的阈值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在高度不平衡的数据集上使用 scikit-learn 中的 LogisticRegression() 方法.我什至将 class_weight 功能变成了 auto.

I am using the LogisticRegression() method in scikit-learn on a highly unbalanced data set. I have even turned the class_weight feature to auto.

我知道在逻辑回归中应该可以知道特定类别对的阈值是多少.

I know that in Logistic Regression it should be possible to know what is the threshold value for a particular pair of classes.

是否可以知道 LogisticRegression() 方法设计的每个 One-vs-All 类中的阈值是多少?

Is it possible to know what the threshold value is in each of the One-vs-All classes the LogisticRegression() method designs?

我在文档页面中没有找到任何内容.

I did not find anything in the documentation page.

它是否默认应用 0.5 值作为所有类的阈值,而不管参数值如何?

Does it by default apply the 0.5 value as threshold for all the classes regardless of the parameter values?

推荐答案

是的,Sci-Kit learn 使用 P>=0.5 的阈值进行二元分类.我将在已经发布的一些答案的基础上使用两个选项来检查:

Yes, Sci-Kit learn is using a threshold of P>=0.5 for binary classifications. I am going to build on some of the answers already posted with two options to check this:

一个简单的选择是使用下面代码的 model.predict_proba(test_x) 段的输出以及类预测(下面代码的 model.predict(test_x) 段的输出)来提取每个分类的概率.然后,将类预测及其概率附加到您的测试数据框中作为检查.

One simple option is to extract the probabilities of each classification using the output from model.predict_proba(test_x) segment of the code below along with class predictions (output from model.predict(test_x) segment of code below). Then, append class predictions and their probabilities to your test dataframe as a check.

作为另一种选择,您可以使用以下代码以图形方式查看各种阈值下的准确率与召回率.

As another option, one can graphically view precision vs. recall at various thresholds using the following code.

### Predict test_y values and probabilities based on fitted logistic 
regression model

pred_y=log.predict(test_x) 

probs_y=log.predict_proba(test_x) 
  # probs_y is a 2-D array of probability of being labeled as 0 (first 
  column of 
  array) vs 1 (2nd column in array)

from sklearn.metrics import precision_recall_curve
precision, recall, thresholds = precision_recall_curve(test_y, probs_y[:, 
1]) 
   #retrieve probability of being 1(in second column of probs_y)
pr_auc = metrics.auc(recall, precision)

plt.title("Precision-Recall vs Threshold Chart")
plt.plot(thresholds, precision[: -1], "b--", label="Precision")
plt.plot(thresholds, recall[: -1], "r--", label="Recall")
plt.ylabel("Precision, Recall")
plt.xlabel("Threshold")
plt.legend(loc="lower left")
plt.ylim([0,1])

这篇关于在 Scikit Learn 中控制逻辑回归中的阈值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆