在Scikit Learn中控制Logistic回归的阈值 [英] Controlling the threshold in Logistic Regression in Scikit Learn
问题描述
我正在高度不平衡的数据集上使用scikit-learn
中的LogisticRegression()
方法.我什至将class_weight
功能更改为auto
.
I am using the LogisticRegression()
method in scikit-learn
on a highly unbalanced data set. I have even turned the class_weight
feature to auto
.
我知道在Logistic回归中应该可以知道特定一对类的阈值是多少.
I know that in Logistic Regression it should be possible to know what is the threshold value for a particular pair of classes.
是否可以知道LogisticRegression()
方法设计的每个一对多"类中的阈值是什么?
Is it possible to know what the threshold value is in each of the One-vs-All classes the LogisticRegression()
method designs?
我在文档页面中找不到任何内容.
I did not find anything in the documentation page.
默认情况下,是否将0.5
值用作所有类的阈值而与参数值无关?
Does it by default apply the 0.5
value as threshold for all the classes regardless of the parameter values?
推荐答案
是的,Sci-Kit学习对二进制分类使用的阈值P> 0.5.我将以一些已经发布的答案为基础,并提供两个选项来检查这一点:
Yes, Sci-Kit learn is using a threshold of P>0.5 for binary classifications. I am going to build on some of the answers already posted with two options to check this:
一个简单的选项是使用下面代码的model.predict_proba(test_x)段的输出以及类预测(下面代码的model.predict(test_x)段的输出)提取每种分类的概率.然后,将类别预测及其概率附加到您的测试数据框中以作为检查.
One simple option is to extract the probabilities of each classification using the output from model.predict_proba(test_x) segment of the code below along with class predictions (output from model.predict(test_x) segment of code below). Then, append class predictions and their probabilities to your test dataframe as a check.
另一种选择是,可以使用以下代码以图形方式查看各种阈值下的精度与召回率.
As another option, one can graphically view precision vs. recall at various thresholds using the following code.
### Predict test_y values and probabilities based on fitted logistic
regression model
pred_y=log.predict(test_x)
probs_y=log.predict_proba(test_x)
# probs_y is a 2-D array of probability of being labeled as 0 (first
column of
array) vs 1 (2nd column in array)
from sklearn.metrics import precision_recall_curve
precision, recall, thresholds = precision_recall_curve(test_y, probs_y[:,
1])
#retrieve probability of being 1(in second column of probs_y)
pr_auc = metrics.auc(recall, precision)
plt.title("Precision-Recall vs Threshold Chart")
plt.plot(thresholds, precision[: -1], "b--", label="Precision")
plt.plot(thresholds, recall[: -1], "r--", label="Recall")
plt.ylabel("Precision, Recall")
plt.xlabel("Threshold")
plt.legend(loc="lower left")
plt.ylim([0,1])
这篇关于在Scikit Learn中控制Logistic回归的阈值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!