为多类计算 sklearn.roc_auc_score [英] Calculate sklearn.roc_auc_score for multi-class

查看:36
本文介绍了为多类计算 sklearn.roc_auc_score的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想为我的分类器计算 AUC、精度和准确度.我在做监督学习:

I would like to calculate AUC, precision, accuracy for my classifier. I am doing supervised learning:

这是我的工作代码.此代码适用于二进制类,但不适用于多类.请假设您有一个包含二进制类的数据框:

Here is my working code. This code is working fine for binary class, but not for multi class. Please assume that you have a dataframe with binary classes:

sample_features_dataframe = self._get_sample_features_dataframe()
labeled_sample_features_dataframe = retrieve_labeled_sample_dataframe(sample_features_dataframe)
labeled_sample_features_dataframe, binary_class_series, multi_class_series = self._prepare_dataframe_for_learning(labeled_sample_features_dataframe)

k = 10
k_folds = StratifiedKFold(binary_class_series, k)
for train_indexes, test_indexes in k_folds:
    train_set_dataframe = labeled_sample_features_dataframe.loc[train_indexes.tolist()]
    test_set_dataframe = labeled_sample_features_dataframe.loc[test_indexes.tolist()]

    train_class = binary_class_series[train_indexes]
    test_class = binary_class_series[test_indexes]
    selected_classifier = RandomForestClassifier(n_estimators=100)
    selected_classifier.fit(train_set_dataframe, train_class)
    predictions = selected_classifier.predict(test_set_dataframe)
    predictions_proba = selected_classifier.predict_proba(test_set_dataframe)

    roc += roc_auc_score(test_class, predictions_proba[:,1])
    accuracy += accuracy_score(test_class, predictions)
    recall += recall_score(test_class, predictions)
    precision += precision_score(test_class, predictions)

最后我当然将结果除以 K 以获得平均 AUC、精度等.这段代码工作正常.但是,我无法为多类计算相同的值:

In the end I divided the results in K of course for getting average AUC, precision, etc. This code is working fine. However, I cannot calculate the same for multi class:

    train_class = multi_class_series[train_indexes]
    test_class = multi_class_series[test_indexes]

    selected_classifier = RandomForestClassifier(n_estimators=100)
    selected_classifier.fit(train_set_dataframe, train_class)

    predictions = selected_classifier.predict(test_set_dataframe)
    predictions_proba = selected_classifier.predict_proba(test_set_dataframe)

我发现对于多类,我必须为平均值添加参数加权".

I found that for multi class I have to add the parameter "weighted" for average.

    roc += roc_auc_score(test_class, predictions_proba[:,1], average="weighted")

出现错误:raise ValueError("{0} 格式不受支持".format(y_type))

I got an error: raise ValueError("{0} format is not supported".format(y_type))

ValueError: 不支持多类格式

ValueError: multiclass format is not supported

推荐答案

您不能将 roc_auc 用作多类模型的单一汇总指标.如果你愿意,你可以计算每个类的 roc_auc,作为

You can't use roc_auc as a single summary metric for multiclass models. If you want, you could calculate per-class roc_auc, as

roc = {label: [] for label in multi_class_series.unique()}
for label in multi_class_series.unique():
    selected_classifier.fit(train_set_dataframe, train_class == label)
    predictions_proba = selected_classifier.predict_proba(test_set_dataframe)
    roc[label] += roc_auc_score(test_class, predictions_proba[:,1])

然而,更常用的是使用 sklearn.metrics.confusion_matrix 来评估多类模型的性能.

However it's more usual to use sklearn.metrics.confusion_matrix to evaluate the performance of a multiclass model.

这篇关于为多类计算 sklearn.roc_auc_score的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆