在Scikit Learn中控制Logistic回归的阈值 [英] Controlling the threshold in Logistic Regression in Scikit Learn

查看：1284 发布时间：2020/5/4 3:16:08 machine-learning scikit-learn classification logistic-regression

本文介绍了在Scikit Learn中控制Logistic回归的阈值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在高度不平衡的数据集上使用scikit-learn中的LogisticRegression()方法.我什至将class_weight功能更改为auto.

I am using the LogisticRegression() method in scikit-learn on a highly unbalanced data set. I have even turned the class_weight feature to auto.

我知道在Logistic回归中应该可以知道特定一对类的阈值是多少.

I know that in Logistic Regression it should be possible to know what is the threshold value for a particular pair of classes.

是否可以知道LogisticRegression()方法设计的每个一对多"类中的阈值是什么?

Is it possible to know what the threshold value is in each of the One-vs-All classes the LogisticRegression() method designs?

我在文档页面中找不到任何内容.

I did not find anything in the documentation page.

默认情况下，是否将0.5值用作所有类的阈值而与参数值无关?

Does it by default apply the 0.5 value as threshold for all the classes regardless of the parameter values?

推荐答案

是的，Sci-Kit学习对二进制分类使用的阈值P> 0.5.我将以一些已经发布的答案为基础，并提供两个选项来检查这一点:

Yes, Sci-Kit learn is using a threshold of P>0.5 for binary classifications. I am going to build on some of the answers already posted with two options to check this:

一个简单的选项是使用下面代码的model.predict_proba(test_x)段的输出以及类预测(下面代码的model.predict(test_x)段的输出)提取每种分类的概率.然后，将类别预测及其概率附加到您的测试数据框中以作为检查.

One simple option is to extract the probabilities of each classification using the output from model.predict_proba(test_x) segment of the code below along with class predictions (output from model.predict(test_x) segment of code below). Then, append class predictions and their probabilities to your test dataframe as a check.

另一种选择是，可以使用以下代码以图形方式查看各种阈值下的精度与召回率.

As another option, one can graphically view precision vs. recall at various thresholds using the following code.

### Predict test_y values and probabilities based on fitted logistic 
regression model

pred_y=log.predict(test_x) 

probs_y=log.predict_proba(test_x) 
  # probs_y is a 2-D array of probability of being labeled as 0 (first 
  column of 
  array) vs 1 (2nd column in array)

from sklearn.metrics import precision_recall_curve
precision, recall, thresholds = precision_recall_curve(test_y, probs_y[:, 
1]) 
   #retrieve probability of being 1(in second column of probs_y)
pr_auc = metrics.auc(recall, precision)

plt.title("Precision-Recall vs Threshold Chart")
plt.plot(thresholds, precision[: -1], "b--", label="Precision")
plt.plot(thresholds, recall[: -1], "r--", label="Recall")
plt.ylabel("Precision, Recall")
plt.xlabel("Threshold")
plt.legend(loc="lower left")
plt.ylim([0,1])

这篇关于在Scikit Learn中控制Logistic回归的阈值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Scikit Learn中控制Logistic回归的阈值 [英] Controlling the threshold in Logistic Regression in Scikit Learn

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

在Scikit Learn中控制Logistic回归的阈值 [英] Controlling the threshold in Logistic Regression in Scikit Learn

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭