sklearn-具有多个分数的交叉验证 [英] sklearn - Cross validation with multiple scores

查看:343
本文介绍了sklearn-具有多个分数的交叉验证的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想为不同的分类器计算交叉验证测试的召回精度 f度量. scikit-learn cross_val_score 一起提供,但不幸的是,方法不会返回多个值.

I would like to compute the recall, precision and f-measure of a cross validation test for different classifiers. scikit-learn comes with cross_val_score but unfortunately such method does not return multiple values.

我可以通过调用 3次 cross_val_score 来计算此类度量,但这并不高效.有更好的解决方案吗?

I could compute such measures by calling three times cross_val_score but that is not efficient. Is there any better solution?

现在我写了这个函数:

from sklearn import metrics

def mean_scores(X, y, clf, skf):

    cm = np.zeros(len(np.unique(y)) ** 2)
    for i, (train, test) in enumerate(skf):
        clf.fit(X[train], y[train])
        y_pred = clf.predict(X[test])
        cm += metrics.confusion_matrix(y[test], y_pred).flatten()

    return compute_measures(*cm / skf.n_folds)

def compute_measures(tp, fp, fn, tn):
     """Computes effectiveness measures given a confusion matrix."""
     specificity = tn / (tn + fp)
     sensitivity = tp / (tp + fn)
     fmeasure = 2 * (specificity * sensitivity) / (specificity + sensitivity)
     return sensitivity, specificity, fmeasure

它基本上总结了混淆矩阵的值,一旦您有假阳性假阴性等,您就可以轻松计算出召回率,精度等...但是我仍然不喜欢这种解决方案:)

It basically sums up the confusion matrix values and once you have false positive, false negative etc you can easily compute the recall, precision etc... But still I don't like this solution :)

推荐答案

现在:cross_validate是一个新功能,可以根据多个指标评估模型. GridSearchCVRandomizedSearchCV(文档). 它已最近合并到母版中,并将在v0.19中提供

Now in scikit-learn: cross_validate is a new function that can evaluate a model on multiple metrics. This feature is also available in GridSearchCV and RandomizedSearchCV (doc). It has been merged recently in master and will be available in v0.19.

来自 scikit学习文档:

cross_validate函数与cross_val_score不同之处在于两个方面:1.它允许指定多个度量标准进行评估. 2. 它返回一个dict,除测试分数外,还包含训练分数,适应时间和分数时间.

The cross_validate function differs from cross_val_score in two ways: 1. It allows specifying multiple metrics for evaluation. 2. It returns a dict containing training scores, fit-times and score-times in addition to the test score.

典型的使用案例是:

from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_validate
iris = load_iris()
scoring = ['precision', 'recall', 'f1']
clf = SVC(kernel='linear', C=1, random_state=0)
scores = cross_validate(clf, iris.data, iris.target == 1, cv=5,
                        scoring=scoring, return_train_score=False)

另请参见此示例.

这篇关于sklearn-具有多个分数的交叉验证的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆