sklearn auc ValueError:y_true中只存在一个类 [英] sklearn auc ValueError: Only one class present in y_true

查看:43
本文介绍了sklearn auc ValueError:y_true中只存在一个类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我搜索了谷歌,看到了一些关于这个错误的 StackOverflow 帖子.他们不是我的情况.

I searched Google, and saw a couple of StackOverflow posts about this error. They are not my cases.

我使用 keras 来训练一个简单的神经网络,并对拆分的测试数据集进行一些预测.但是当使用 roc_auc_score 计算 AUC 时,出现以下错误:

I use keras to train a simple neural network and make some predictions on the splitted test dataset. But when use roc_auc_score to calculate AUC, I got the following error:

ValueError:y_true 中只存在一个类.在这种情况下未定义 ROC AUC 分数.".

我检查了目标标签分布,它们非常不平衡.一些标签(总共 29 个标签中)只有 1 个实例.因此,他们很可能在测试标签中没有正标签实例.所以sklearn的roc_auc_score函数只报了一类问题.这是合理的.

I inspect the target label distribution, and they are highly imbalanced. Some labels(in the total 29 labels) have only 1 instance. So it's likely they will have no positive label instance in the test label. So the sklearn's roc_auc_score function reported the only one class problem. That's reasonable.

但是我很好奇,因为当我使用sklearn的cross_val_score函数时,它可以毫无错误地处理AUC计算.

But I'm curious, as when I use sklearn's cross_val_score function, it can handle the AUC calculation without error.

my_metric = 'roc_auc' 
scores = cross_validation.cross_val_score(myestimator, data,
                                   labels, cv=5,scoring=my_metric)

我想知道 cross_val_score 发生了什么,是不是因为 cross_val_score 使用了分层交叉验证数据拆分?

I wonder what happened in the cross_val_score, is it because the cross_val_score use a stratified cross-validation data split?

更新
我继续挖了一些,还是没找到后面的区别.我看到cross_val_score调用check_scoring(estimator, score=None, allow_none=False)返回一个scorer,而check_scoring 将调用 get_scorer(scoring) 将返回 scorer=SCORERS[scoring]

UPDATE
I continued to make some digging, but still can't find the difference behind.I see that cross_val_score call check_scoring(estimator, scoring=None, allow_none=False) to return a scorer, and the check_scoring will call get_scorer(scoring) which will return scorer=SCORERS[scoring]

SCORERS['roc_auc']就是roc_auc_scorer
roc_auc_scorer

roc_auc_scorer = make_scorer(roc_auc_score, greater_is_better=True,
                                 needs_threshold=True)

所以,它仍然使用 roc_auc_score 函数.我不明白为什么 cross_val_score 与直接调用 roc_auc_score 的行为不同.

So, it's still using the roc_auc_score function. I don't get why cross_val_score behave differently with directly calling roc_auc_score.

推荐答案

我认为您的预感是正确的.AUC(ROC 曲线下面积)需要足够数量的任一类才能有意义.

I think your hunch is correct. The AUC (area under ROC curve) needs a sufficient number of either classes in order to make sense.

默认情况下,cross_val_score 分别计算每折一倍的性能指标.另一种选择是执行 cross_val_predict 并计算所有折叠组合的 AUC.

By default, cross_val_score calculates the performance metric one each fold separately. Another option could be to do cross_val_predict and compute the AUC over all folds combined.

你可以这样做:

from sklearn.metrics import roc_auc_score
from sklearn.cross_validation import cross_val_predict
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification


class ProbaEstimator(LogisticRegression):
    """
    This little hack needed, because `cross_val_predict`
    uses `estimator.predict(X)` internally.

    Replace `LogisticRegression` with whatever classifier you like.

    """
    def predict(self, X):
        return super(self.__class__, self).predict_proba(X)[:, 1]


# some example data
X, y = make_classification()

# define your estimator
estimator = ProbaEstimator()

# get predictions
pred = cross_val_predict(estimator, X, y, cv=5)

# compute AUC score
roc_auc_score(y, pred)

这篇关于sklearn auc ValueError:y_true中只存在一个类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆