sklearn - 具有多个分数的交叉验证 [英] sklearn - Cross validation with multiple scores
问题描述
我想计算不同分类器的交叉验证测试的召回、精度和f-measure.scikit-learn 带有 cross_val_score 但不幸的是这样方法不返回多个值.
I would like to compute the recall, precision and f-measure of a cross validation test for different classifiers. scikit-learn comes with cross_val_score but unfortunately such method does not return multiple values.
我可以通过调用 3 次 cross_val_score 来计算这样的度量,但这效率不高.有没有更好的解决办法?
I could compute such measures by calling three times cross_val_score but that is not efficient. Is there any better solution?
现在我写了这个函数:
from sklearn import metrics
def mean_scores(X, y, clf, skf):
cm = np.zeros(len(np.unique(y)) ** 2)
for i, (train, test) in enumerate(skf):
clf.fit(X[train], y[train])
y_pred = clf.predict(X[test])
cm += metrics.confusion_matrix(y[test], y_pred).flatten()
return compute_measures(*cm / skf.n_folds)
def compute_measures(tp, fp, fn, tn):
"""Computes effectiveness measures given a confusion matrix."""
specificity = tn / (tn + fp)
sensitivity = tp / (tp + fn)
fmeasure = 2 * (specificity * sensitivity) / (specificity + sensitivity)
return sensitivity, specificity, fmeasure
它基本上总结了混淆矩阵的值,一旦你有假阳性、假阴性等,你就可以轻松计算召回率、精度等......但我仍然不喜欢这个解决方案:)
It basically sums up the confusion matrix values and once you have false positive, false negative etc you can easily compute the recall, precision etc... But still I don't like this solution :)
推荐答案
现在在 scikit-learn 中:cross_validate
是一个新函数,可以在多个指标上评估模型.此功能也可用于 GridSearchCV
和 RandomizedSearchCV
(doc).它已经最近合并到 master 并将在 v0.19 中提供.
Now in scikit-learn: cross_validate
is a new function that can evaluate a model on multiple metrics.
This feature is also available in GridSearchCV
and RandomizedSearchCV
(doc).
It has been merged recently in master and will be available in v0.19.
来自 scikit-learn 文档:
cross_validate
函数在两个方面与 cross_val_score
不同: 1. 它允许指定多个评估指标.2.除了测试分数之外,它还返回一个包含训练分数、拟合时间和分数时间的字典.
The
cross_validate
function differs fromcross_val_score
in two ways: 1. It allows specifying multiple metrics for evaluation. 2. It returns a dict containing training scores, fit-times and score-times in addition to the test score.
典型用例如下:
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_validate
iris = load_iris()
scoring = ['precision', 'recall', 'f1']
clf = SVC(kernel='linear', C=1, random_state=0)
scores = cross_validate(clf, iris.data, iris.target == 1, cv=5,
scoring=scoring, return_train_score=False)
另见这个例子.
这篇关于sklearn - 具有多个分数的交叉验证的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!