sklearn如何选择精度召回曲线中的阈值步长? [英] How does sklearn select threshold steps in precision recall curve?

查看:518
本文介绍了sklearn如何选择精度召回曲线中的阈值步长?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在一个示例乳腺癌数据集上训练了基本的FFNN.对于结果,precision_recall_curve函数提供416个不同阈值的数据点.据我了解,精确调用曲线"可以包含569个唯一的预测值,我可以应用568个不同的阈值并检查生成的精确调用".

I trained a basic FFNN on a example breast cancer dataset. For the results the precision_recall_curve function gives datapoints for 416 different thresholds. My Data contains 569 unique prediction values, as far as I understand the Precision Recall Curve I could apply 568 different threshold values and check the resulting Precision and Recall.

但是我该怎么做呢?有没有办法设置要使用sklearn测试的阈值数量?还是至少对sklearn如何选择这些阈值的解释?

But how do I do so? is there a way to set the number of thresholds to test with sklearn? Or at least an explanation of how sklearn selects those thresholds?

我的意思是417应该足够了,即使对于更大的数据集,我也很好奇它们是如何被选择的.

I mean 417 should be enough, even for bigger data sets, I am just curious how they got selected.

# necessary packages
from sklearn.datasets import load_breast_cancer
import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout

# load data
sk_data = load_breast_cancer(return_X_y=False)

# safe data in pandas
data = sk_data['data']
target = sk_data['target']
target_names = sk_data['target_names']
feature_names = sk_data['feature_names']
data = pd.DataFrame(data=data, columns=feature_names)

# build ANN
model = Sequential()
model.add(Dense(64, kernel_initializer='random_uniform', input_dim=30, activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(32, kernel_initializer='random_uniform', activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(1, activation='sigmoid'))

# train ANN
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()

model.fit(data, target, epochs=50, batch_size=10, validation_split=0.2)

# eval
pred = model.predict(data)

# calculate precision-recall curve
from sklearn.metrics import precision_recall_curve
precision, recall, thresholds = precision_recall_curve(target, pred)

# precision-recall curve and f1
import matplotlib.pyplot as plt

#pyplot.plot([0, 1], [0.5, 0.5], linestyle='--')
plt.plot(recall, precision, marker='.')
# show the plot
plt.show()

len(np.unique(pred)) #569
len(thresholds) # 417

推荐答案

阅读

Reading the source, precision_recall_curve does compute precision and recall for each unique predicted probability (here pred) but then omits the output for all thresholds that result in full recall (apart from the very first threshold to achieve full recall).

这篇关于sklearn如何选择精度召回曲线中的阈值步长?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆