如何为班级概率选择最佳阈值? [英] How to choose optimal threshold for class probabilities?

查看:128
本文介绍了如何为班级概率选择最佳阈值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的神经网络输出是多标签分类的预测类概率表:

My output of neural network is table of predicted class probabilities for multi-label classification:

print(probabilities)

|   |      1       |      3       | ... |     8354     |     8356     |     8357     |
|---|--------------|--------------|-----|--------------|--------------|--------------|
| 0 | 2.442745e-05 | 5.952136e-06 | ... | 4.254002e-06 | 1.894523e-05 | 1.033957e-05 |
| 1 | 7.685694e-05 | 3.252202e-06 | ... | 3.617730e-06 | 1.613792e-05 | 7.356643e-06 |
| 2 | 2.296657e-06 | 4.859554e-06 | ... | 9.934525e-06 | 9.244772e-06 | 1.377618e-05 |
| 3 | 5.163169e-04 | 1.044035e-04 | ... | 1.435158e-04 | 2.807420e-04 | 2.346930e-04 |
| 4 | 2.484626e-06 | 2.074290e-06 | ... | 9.958628e-06 | 6.002510e-06 | 8.434519e-06 |
| 5 | 1.297477e-03 | 2.211737e-04 | ... | 1.881772e-04 | 3.171079e-04 | 3.228884e-04 |

我使用阈值( 0.2 )将其转换为类别标签,用于测量预测的准确性:

I converted it to class labels using a threshold (0.2) for measuring accuraccy of my prediction:

predictions = (probabilities > 0.2).astype(np.int)
print(predictions)

|   | 1 | 3 | ... | 8354 | 8356 | 8357 |
|---|---|---|-----|------|------|------|
| 0 | 0 | 0 | ... |    0 |    0 |    0 |
| 1 | 0 | 0 | ... |    0 |    0 |    0 |
| 2 | 0 | 0 | ... |    0 |    0 |    0 |
| 3 | 0 | 0 | ... |    0 |    0 |    0 |
| 4 | 0 | 0 | ... |    0 |    0 |    0 |
| 5 | 0 | 0 | ... |    0 |    0 |    0 |

我也有一个测试仪:

print(Y_test)

|   | 1 | 3 | ... | 8354 | 8356 | 8357 |
|---|---|---|-----|------|------|------|
| 0 | 0 | 0 | ... |    0 |    0 |    0 |
| 1 | 0 | 0 | ... |    0 |    0 |    0 |
| 2 | 0 | 0 | ... |    0 |    0 |    0 |
| 3 | 0 | 0 | ... |    0 |    0 |    0 |
| 4 | 0 | 0 | ... |    0 |    0 |    0 |
| 5 | 0 | 0 | ... |    0 |    0 |    0 |

问题::如何在Python中构建算法,该算法将选择最大化roc_auc_score(average = 'micro')或其他指标的最佳阈值?

Question: How to build an algorithm in Python that will choose the optimal threshold that maximize roc_auc_score(average = 'micro') or another metrics?

也许可以在Python中构建手动函数来优化阈值,具体取决于准确性指标.

Maybe it is possible to build manual function in Python that optimize threshold, depending on the accuracy metric.

推荐答案

我假设您的真实标签是Y_test,预测是predictions.

I assume your groundtruth labels are Y_test and predictions are predictions.

根据预测threshold优化roc_auc_score(average = 'micro')似乎没有意义,因为根据预测的排名方式计算AUC,因此需要predictions作为[0,1]中的浮点值.

Optimizing roc_auc_score(average = 'micro') according to a prediction threshold does not seem to make sense as AUCs are computed based on how predictions are ranked and therefore need predictions as float values in [0,1].

因此,我将讨论accuracy_score.

您可以使用 scipy.optimize.fmin :

You could use scipy.optimize.fmin:

def thr_to_accuracy(thr, Y_test, predictions):
   return -accuracy_score(Y_test, np.array(predictions>thr, dtype=np.int))

best_thr = scipy.optimize.fmin(thr_to_accuracy, args=(Y_test, predictions), x0=0.5)

这篇关于如何为班级概率选择最佳阈值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆