sklearn如何选择精度召回曲线中的阈值步长? [英] How does sklearn select threshold steps in precision recall curve?

查看：518 发布时间：2020/6/12 19:22:22 python scikit-learn precision precision-recall

本文介绍了sklearn如何选择精度召回曲线中的阈值步长?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在一个示例乳腺癌数据集上训练了基本的FFNN.对于结果，precision_recall_curve函数提供416个不同阈值的数据点.据我了解，精确调用曲线"可以包含569个唯一的预测值，我可以应用568个不同的阈值并检查生成的精确调用".

I trained a basic FFNN on a example breast cancer dataset. For the results the precision_recall_curve function gives datapoints for 416 different thresholds. My Data contains 569 unique prediction values, as far as I understand the Precision Recall Curve I could apply 568 different threshold values and check the resulting Precision and Recall.

但是我该怎么做呢?有没有办法设置要使用sklearn测试的阈值数量?还是至少对sklearn如何选择这些阈值的解释?

But how do I do so? is there a way to set the number of thresholds to test with sklearn? Or at least an explanation of how sklearn selects those thresholds?

我的意思是417应该足够了，即使对于更大的数据集，我也很好奇它们是如何被选择的.

I mean 417 should be enough, even for bigger data sets, I am just curious how they got selected.

# necessary packages
from sklearn.datasets import load_breast_cancer
import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout

# load data
sk_data = load_breast_cancer(return_X_y=False)

# safe data in pandas
data = sk_data['data']
target = sk_data['target']
target_names = sk_data['target_names']
feature_names = sk_data['feature_names']
data = pd.DataFrame(data=data, columns=feature_names)

# build ANN
model = Sequential()
model.add(Dense(64, kernel_initializer='random_uniform', input_dim=30, activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(32, kernel_initializer='random_uniform', activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(1, activation='sigmoid'))

# train ANN
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()

model.fit(data, target, epochs=50, batch_size=10, validation_split=0.2)

# eval
pred = model.predict(data)

# calculate precision-recall curve
from sklearn.metrics import precision_recall_curve
precision, recall, thresholds = precision_recall_curve(target, pred)

# precision-recall curve and f1
import matplotlib.pyplot as plt

#pyplot.plot([0, 1], [0.5, 0.5], linestyle='--')
plt.plot(recall, precision, marker='.')
# show the plot
plt.show()

len(np.unique(pred)) #569
len(thresholds) # 417

sklearn如何选择精度召回曲线中的阈值步长? [英] How does sklearn select threshold steps in precision recall curve?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

sklearn如何选择精度召回曲线中的阈值步长? [英] How does sklearn select threshold steps in precision recall curve?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭