为多类计算 sklearn.roc_auc_score [英] Calculate sklearn.roc_auc_score for multi-class
问题描述
我想为我的分类器计算 AUC、精度和准确度.我在做监督学习:
I would like to calculate AUC, precision, accuracy for my classifier. I am doing supervised learning:
这是我的工作代码.此代码适用于二进制类,但不适用于多类.请假设您有一个包含二进制类的数据框:
Here is my working code. This code is working fine for binary class, but not for multi class. Please assume that you have a dataframe with binary classes:
sample_features_dataframe = self._get_sample_features_dataframe()
labeled_sample_features_dataframe = retrieve_labeled_sample_dataframe(sample_features_dataframe)
labeled_sample_features_dataframe, binary_class_series, multi_class_series = self._prepare_dataframe_for_learning(labeled_sample_features_dataframe)
k = 10
k_folds = StratifiedKFold(binary_class_series, k)
for train_indexes, test_indexes in k_folds:
train_set_dataframe = labeled_sample_features_dataframe.loc[train_indexes.tolist()]
test_set_dataframe = labeled_sample_features_dataframe.loc[test_indexes.tolist()]
train_class = binary_class_series[train_indexes]
test_class = binary_class_series[test_indexes]
selected_classifier = RandomForestClassifier(n_estimators=100)
selected_classifier.fit(train_set_dataframe, train_class)
predictions = selected_classifier.predict(test_set_dataframe)
predictions_proba = selected_classifier.predict_proba(test_set_dataframe)
roc += roc_auc_score(test_class, predictions_proba[:,1])
accuracy += accuracy_score(test_class, predictions)
recall += recall_score(test_class, predictions)
precision += precision_score(test_class, predictions)
最后我当然将结果除以 K 以获得平均 AUC、精度等.这段代码工作正常.但是,我无法为多类计算相同的值:
In the end I divided the results in K of course for getting average AUC, precision, etc. This code is working fine. However, I cannot calculate the same for multi class:
train_class = multi_class_series[train_indexes]
test_class = multi_class_series[test_indexes]
selected_classifier = RandomForestClassifier(n_estimators=100)
selected_classifier.fit(train_set_dataframe, train_class)
predictions = selected_classifier.predict(test_set_dataframe)
predictions_proba = selected_classifier.predict_proba(test_set_dataframe)
我发现对于多类,我必须为平均值添加参数加权".
I found that for multi class I have to add the parameter "weighted" for average.
roc += roc_auc_score(test_class, predictions_proba[:,1], average="weighted")
出现错误:raise ValueError("{0} 格式不受支持".format(y_type))
I got an error: raise ValueError("{0} format is not supported".format(y_type))
ValueError: 不支持多类格式
ValueError: multiclass format is not supported
推荐答案
您不能将 roc_auc
用作多类模型的单一汇总指标.如果你愿意,你可以计算每个类的 roc_auc
,作为
You can't use roc_auc
as a single summary metric for multiclass models. If you want, you could calculate per-class roc_auc
, as
roc = {label: [] for label in multi_class_series.unique()}
for label in multi_class_series.unique():
selected_classifier.fit(train_set_dataframe, train_class == label)
predictions_proba = selected_classifier.predict_proba(test_set_dataframe)
roc[label] += roc_auc_score(test_class, predictions_proba[:,1])
然而,更常用的是使用 sklearn.metrics.confusion_matrix
来评估多类模型的性能.
However it's more usual to use sklearn.metrics.confusion_matrix
to evaluate the performance of a multiclass model.
这篇关于为多类计算 sklearn.roc_auc_score的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!