为什么 CalibratedClassifierCV 的性能不如直接分类器? [英] Why does CalibratedClassifierCV underperform a direct classifer?

查看:63
本文介绍了为什么 CalibratedClassifierCV 的性能不如直接分类器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到当 base_estimatorGradientBoostingClassifer 时,sklearn 的新 CalibratedClassifierCV 似乎比直接的 base_estimator 表现差,(我还没有测试其他分类器).有趣的是,如果 make_classification 的参数是:

I noticed that sklearn's new CalibratedClassifierCV seems to underperform the direct base_estimator when the base_estimator is GradientBoostingClassifer, (I haven't tested other classifiers). Interestingly, if make_classification's parameters are:

n_features = 10
n_informative = 3
n_classes = 2

那么 CalibratedClassifierCV 似乎是略胜一筹(对数损失评估).

then the CalibratedClassifierCV seems to be the slight outperformer (log loss evaluation).

然而,在以下分类数据集下,CalibratedClassifierCV 似乎通常表现不佳:

However, under the following classification data set the CalibratedClassifierCV seems to generally be the underperformer:

from sklearn.datasets import make_classification
from sklearn import ensemble
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import log_loss
from sklearn import cross_validation
# Build a classification task using 3 informative features

X, y = make_classification(n_samples=1000,
                           n_features=100,
                           n_informative=30,
                           n_redundant=0,
                           n_repeated=0,
                           n_classes=9,
                           random_state=0,
                           shuffle=False)

skf = cross_validation.StratifiedShuffleSplit(y, 5)

for train, test in skf:

    X_train, X_test = X[train], X[test]
    y_train, y_test = y[train], y[test]

    clf = ensemble.GradientBoostingClassifier(n_estimators=100)
    clf_cv = CalibratedClassifierCV(clf, cv=3, method='isotonic')
    clf_cv.fit(X_train, y_train)
    probas_cv = clf_cv.predict_proba(X_test)
    cv_score = log_loss(y_test, probas_cv)

    clf = ensemble.GradientBoostingClassifier(n_estimators=100)
    clf.fit(X_train, y_train)
    probas = clf.predict_proba(X_test)
    clf_score = log_loss(y_test, probas) 

    print 'calibrated score:', cv_score
    print 'direct clf score:', clf_score
    print

一次运行:

也许我遗漏了一些关于 CalibratedClassifierCV 的工作原理,或者我没有正确使用它,但我的印象是,如果有的话,将分类器传递给 CalibratedClassifierCV与单独使用 base_estimator 相比,性能会有所提高.

Maybe I'm missing something about how CalibratedClassifierCV works, or am not using it correctly, but I was under the impression that if anything, passing a classifier to CalibratedClassifierCV would result in improved performance relative to the base_estimator alone.

谁能解释这种观察到的表现不佳的现象?

Can anyone explain this observed underperformance?

推荐答案

概率校准本身需要交叉验证,因此 CalibratedClassifierCV 每折叠训练一个校准分类器(在这种情况下使用 StratifiedKFold),并在调用 predict_proba() 时取每个分类器的预测概率的平均值.这可能会导致对效果的解释.

The probability calibration itself requires cross-validation, therefore the CalibratedClassifierCV trains a calibrated classifier per fold (in this case using StratifiedKFold), and takes the mean of the predicted probabilities from each classifier when you call predict_proba(). This could lead to the explanation of the effect.

我的假设是,如果训练集相对于特征和类的数量很小,每个子分类器的减少训练集会影响性能,而集成并不能弥补(或使其变得更糟).此外,GradientBoostingClassifier 可能从一开始就已经提供了非常好的概率估计,因为它的损失函数针对概率估计进行了优化.

My hypothesis is that if the training set is small with respect to the number of features and classes, the reduced training set for each sub-classifier affects performance and the ensembling does not make up for it (or makes it worse). Also the GradientBoostingClassifier might provide already pretty good probability estimates from the start as its loss function is optimized for probability estimation.

如果这是正确的,集成分类器与 CalibratedClassifierCV 相同但没有校准应该比单个分类器差.此外,当使用大量折叠进行校准时,效果应该会消失.

If that's correct, ensembling classifiers the same way as the CalibratedClassifierCV but without calibration should be worse than the single classifier. Also, the effect should disappear when using a larger number of folds for calibration.

为了测试这一点,我扩展了您的脚本以增加折叠次数并包含未校准的集成分类器,并且我能够确认我的预测.一个 10 倍校准的分类器总是比单个分类器表现得更好,而未校准的集成则明显更差.在我的运行中,3 倍校准分类器的性能也没有比单个分类器差,所以这也可能是一个不稳定的效果.这些是在同一数据集上的详细结果:

To test that, I extended your script to increase the number of folds and include the ensembled classifier without calibration, and I was able to confirm my predictions. A 10-fold calibrated classifier always performed better than the single classifier and the uncalibrated ensemble was significantly worse. In my run, the 3-fold calibrated classifier also did not really perform worse than the single classifier, so this might be also an unstable effect. These are the detailed results on the same dataset:

这是我的实验代码:

import numpy as np
from sklearn.datasets import make_classification
from sklearn import ensemble
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import log_loss
from sklearn import cross_validation

X, y = make_classification(n_samples=1000,
                           n_features=100,
                           n_informative=30,
                           n_redundant=0,
                           n_repeated=0,
                           n_classes=9,
                           random_state=0,
                           shuffle=False)

skf = cross_validation.StratifiedShuffleSplit(y, 5)

for train, test in skf:

    X_train, X_test = X[train], X[test]
    y_train, y_test = y[train], y[test]

    clf = ensemble.GradientBoostingClassifier(n_estimators=100)
    clf_cv = CalibratedClassifierCV(clf, cv=3, method='isotonic')
    clf_cv.fit(X_train, y_train)
    probas_cv = clf_cv.predict_proba(X_test)
    cv_score = log_loss(y_test, probas_cv)
    print 'calibrated score (3-fold):', cv_score


    clf = ensemble.GradientBoostingClassifier(n_estimators=100)
    clf_cv = CalibratedClassifierCV(clf, cv=10, method='isotonic')
    clf_cv.fit(X_train, y_train)
    probas_cv = clf_cv.predict_proba(X_test)
    cv_score = log_loss(y_test, probas_cv)
    print 'calibrated score (10-fold:)', cv_score

    #Train 3 classifiers and take average probability
    skf2 = cross_validation.StratifiedKFold(y_test, 3)
    probas_list = []
    for sub_train, sub_test in skf2:
        X_sub_train, X_sub_test = X_train[sub_train], X_train[sub_test]
        y_sub_train, y_sub_test = y_train[sub_train], y_train[sub_test]
        clf = ensemble.GradientBoostingClassifier(n_estimators=100)
        clf.fit(X_sub_train, y_sub_train)
        probas_list.append(clf.predict_proba(X_test))
    probas = np.mean(probas_list, axis=0)
    clf_ensemble_score = log_loss(y_test, probas)
    print 'uncalibrated ensemble clf (3-fold) score:', clf_ensemble_score

    clf = ensemble.GradientBoostingClassifier(n_estimators=100)
    clf.fit(X_train, y_train)
    probas = clf.predict_proba(X_test)
    score = log_loss(y_test, probas)
    print 'direct clf score:', score
    print

这篇关于为什么 CalibratedClassifierCV 的性能不如直接分类器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆