Scikit学习TypeError:如果未指定评分,则传递的估算器应具有“评分”方法 [英] Scikit-learn TypeError: If no scoring is specified, the estimator passed should have a 'score' method

查看:1319
本文介绍了Scikit学习TypeError:如果未指定评分,则传递的估算器应具有“评分”方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用scikit-learn在python中创建了一个自定义模型,我想使用交叉验证。

I have created a custom model in python using scikit-learn, and I want to use cross validation.

模型的类定义如下:

class MultiLabelEnsemble:
''' MultiLabelEnsemble(predictorInstance, balance=False)
    Like OneVsRestClassifier: Wrapping class to train multiple models when 
    several objectives are given as target values. Its predictor may be an ensemble.
    This class can be used to create a one-vs-rest classifier from multiple 0/1 labels
    to treat a multi-label problem or to create a one-vs-rest classifier from
    a categorical target variable.
    Arguments:
        predictorInstance -- A predictor instance is passed as argument (be careful, you must instantiate
    the predictor class before passing the argument, i.e. end with (), 
    e.g. LogisticRegression().
        balance -- True/False. If True, attempts to re-balance classes in training data
        by including a random sample (without replacement) s.t. the largest class has at most 2 times
    the number of elements of the smallest one.
    Example Usage: mymodel =  MultiLabelEnsemble (GradientBoostingClassifier(), True)'''

def __init__(self, predictorInstance, balance=False):
    self.predictors = [predictorInstance]
    self.n_label = 1
    self.n_target = 1
    self.n_estimators =  1 # for predictors that are ensembles of estimators
    self.balance=balance

def __repr__(self):
    return "MultiLabelEnsemble"

def __str__(self):
    return "MultiLabelEnsemble : \n" + "\tn_label={}\n".format(self.n_label) + "\tn_target={}\n".format(self.n_target) + "\tn_estimators={}\n".format(self.n_estimators) + str(self.predictors[0])

def fit(self, Xtrain, Ytrain):
    if len(Ytrain.shape)==1: 
        Ytrain = np.array([Ytrain]).transpose() # Transform vector into column matrix
        # This is NOT what we want: Y = Y.reshape( -1, 1 ), because Y.shape[1] out of range
    self.n_target = Ytrain.shape[1]                # Num target values = num col of Y
    self.n_label = len(set(Ytrain.ravel()))        # Num labels = num classes (categories of categorical var if n_target=1 or n_target if labels are binary )
    # Create the right number of copies of the predictor instance
    if len(self.predictors)!=self.n_target:
        predictorInstance = self.predictors[0]
        self.predictors = [predictorInstance]
        for i in range(1,self.n_target):
            self.predictors.append(copy.copy(predictorInstance))
    # Fit all predictors
    for i in range(self.n_target):
        # Update the number of desired prodictos
        if hasattr(self.predictors[i], 'n_estimators'):
            self.predictors[i].n_estimators=self.n_estimators
        # Subsample if desired
        if self.balance:
            pos = Ytrain[:,i]>0
            neg = Ytrain[:,i]<=0
            if sum(pos)<sum(neg): 
                chosen = pos
                not_chosen = neg
            else: 
                chosen = neg
                not_chosen = pos
            num = sum(chosen)
            idx=filter(lambda(x): x[1]==True, enumerate(not_chosen))
            idx=np.array(zip(*idx)[0])
            np.random.shuffle(idx)
            chosen[idx[0:min(num, len(idx))]]=True
            # Train with chosen samples            
            self.predictors[i].fit(Xtrain[chosen,:],Ytrain[chosen,i])
        else:
            self.predictors[i].fit(Xtrain,Ytrain[:,i])
    return

def predict_proba(self, Xtrain):
    if len(Xtrain.shape)==1: # IG modif Feb3 2015
        X = np.reshape(Xtrain,(-1,1))   
    prediction = self.predictors[0].predict_proba(Xtrain)
    if self.n_label==2:                 # Keep only 1 prediction, 1st column = (1 - 2nd column)
        prediction = prediction[:,1]
    for i in range(1,self.n_target): # More than 1 target, we assume that labels are binary
        new_prediction = self.predictors[i].predict_proba(Xtrain)[:,1]
        prediction = np.column_stack((prediction, new_prediction))
    return prediction

当我这样调用此类进行交叉验证时:

When I call this class for cross validation like this:

kf = cross_validation.KFold(len(Xtrain), n_folds=10)
score = cross_val_score(self.model, Xtrain, Ytrain, cv=kf, n_jobs=-1).mean()

我收到以下错误:

TypeError:如果未指定评分,传递的估算器应具有得分方法。估计器MultiLabelEnsemble不会。

TypeError: If no scoring is specified, the estimator passed should have a 'score' method. The estimator MultiLabelEnsemble does not.

如何创建评分方法?

推荐答案

使错误消失的最简单方法是传递 scoring = accuracy scoring = hamming cross_val_score cross_val_score 函数本身不知道您要解决的问题是什么,因此它不知道什么是合适的指标。看来您正在尝试进行多标签分类,所以也许您想使用汉明损失?

The easiest way to make the error go away is to pass scoring="accuracy" or scoring="hamming" to cross_val_score. The cross_val_score function itself doesn't know what kind of problem you are trying to solve, so it doesn't know what an appropriate metric is. It looks like you are trying to do multi-label classification, so maybe you want to use the hamming loss?

您还可以实现分数方法,如滚动您自己的估算器文档中所述,该方法具有签名
def score(self,X,y_true) 。参见 http://scikit-learn.org/stable/developers/#different-objects

You can also implement a score method as explained in the "Roll your own estimator" docs, which has as signature def score(self, X, y_true). See http://scikit-learn.org/stable/developers/#different-objects

顺便说一句,您确实了解 OneVsRestClassifier ,对吗?看起来有点像您在重新实现它。

By the way, you do know about the OneVsRestClassifier, right? It looks a bit like you are reimplementing it.

这篇关于Scikit学习TypeError:如果未指定评分,则传递的估算器应具有“评分”方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆