评分“roc_auc"值不适用于 gridsearchCV 应用 RandomForestclassifer [英] scoring "roc_auc" value is not working with gridsearchCV appling RandomForestclassifer

查看:59
本文介绍了评分“roc_auc"值不适用于 gridsearchCV 应用 RandomForestclassifer的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 gridsearchCV 执行此操作时,我不断收到此错误,评分值为 'roc_auc'('f1', 'precision','recall' 工作正常)

I keep getting this error when perform this with gridsearchCV with scoring value is 'roc_auc'('f1', 'precision','recall' work fine)

# Construct a pipeline
pipe = Pipeline([
('reduce_dim',PCA()),
('rf',RandomForestClassifier(min_samples_leaf=5,random_state=123))
])

N_FEATURES_OPTIONS = [2]  # for PCA [2, 4, 8]

# these below param is for RandomForestClassifier
N_ESTIMATORS = [10,50]  # 10,50,100
MAX_DEPTH = [5,6]  # 5,6,7,8,9
MIN_SAMPLE_LEAF = 5

param_grid = [
    {
        'reduce_dim': [PCA()],
        'reduce_dim__n_components': N_FEATURES_OPTIONS,
        'rf__n_estimators' : N_ESTIMATORS,
        'rf__max_depth': MAX_DEPTH
    },
    {
        'reduce_dim': [SelectKBest(f_classif)],
        'reduce_dim__k': N_FEATURES_OPTIONS,
        'rf__n_estimators' : N_ESTIMATORS,
        'rf__max_depth': MAX_DEPTH
    },
]

grid = GridSearchCV(pipe, param_grid= param_grid, cv =10,n_jobs=1,scoring = 'roc_auc')
grid.fit(X_train_s,y_train_s)

我收到这个错误

AttributeError                            Traceback (most recent call last)
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/metrics/scorer.py in __call__(self, clf, X, y, sample_weight)
    186             try:
--> 187                 y_pred = clf.decision_function(X)
    188 

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/utils/metaestimators.py in __get__(self, obj, type)
    108                 else:
--> 109                     getattr(delegate, self.attribute_name)
    110                     break

AttributeError: 'RandomForestClassifier' object has no attribute 'decision_function'

During handling of the above exception, another exception occurred:

IndexError                                Traceback (most recent call last)
<ipython-input-16-86491f3b6aa7> in <module>()
----> 1 grid.fit(X_train_s,y_train_s)

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
    637                                   error_score=self.error_score)
    638           for parameters, (train, test) in product(candidate_params,
--> 639                                                    cv.split(X, y, groups)))
    640 
    641         # if one choose to see train score, "out" will contain train score info

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
    777             # was dispatched. In particular this covers the edge
    778             # case of Parallel used with an exhausted iterator.
--> 779             while self.dispatch_one_batch(iterator):
    780                 self._iterating = True
    781             else:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self, iterator)
    623                 return False
    624             else:
--> 625                 self._dispatch(tasks)
    626                 return True
    627 

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in _dispatch(self, batch)
    586         dispatch_timestamp = time.time()
    587         cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self)
--> 588         job = self._backend.apply_async(batch, callback=cb)
    589         self._jobs.append(job)
    590 

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py in apply_async(self, func, callback)
    109     def apply_async(self, func, callback=None):
    110         """Schedule a func to be run"""
--> 111         result = ImmediateResult(func)
    112         if callback:
    113             callback(result)

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py in __init__(self, batch)
    330         # Don't delay the application, to avoid keeping the input
    331         # arguments in memory
--> 332         self.results = batch()
    333 
    334     def get(self):

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
    132 
    133     def __len__(self):

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
    132 
    133     def __len__(self):

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/model_selection/_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, error_score)
    486         fit_time = time.time() - start_time
    487         # _score will return dict if is_multimetric is True
--> 488         test_scores = _score(estimator, X_test, y_test, scorer, is_multimetric)
    489         score_time = time.time() - start_time - fit_time
    490         if return_train_score:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/model_selection/_validation.py in _score(estimator, X_test, y_test, scorer, is_multimetric)
    521     """
    522     if is_multimetric:
--> 523         return _multimetric_score(estimator, X_test, y_test, scorer)
    524     else:
    525         if y_test is None:

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/model_selection/_validation.py in _multimetric_score(estimator, X_test, y_test, scorers)
    551             score = scorer(estimator, X_test)
    552         else:
--> 553             score = scorer(estimator, X_test, y_test)
    554 
    555         if hasattr(score, 'item'):

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/metrics/scorer.py in __call__(self, clf, X, y, sample_weight)
    195 
    196                 if y_type == "binary":
--> 197                     y_pred = y_pred[:, 1]
    198                 elif isinstance(y_pred, list):
    199                     y_pred = np.vstack([p[:, -1] for p in y_pred]).T

IndexError: index 1 is out of bounds for axis 1 with size 1

我已经查找了这个错误,并在 Kerasclassifier 中发现了一些类似的问题.但我不知道如何解决它

I have looked up for this error and found some kind of similar problem here with Kerasclassifier. But I have no idea how to fix it

Scikit Learn 的 Keras 包装器 - AUC 评分器是不工作

谁能给我解释一下有什么问题???

can anyone explain to me what is wrong???

推荐答案

错误可能是因为某些原因:

  • 如果你只有一个目标类:它失败
  • 如果您有 >=3 个目标类:如果失败.
  • 也许您有 2 个班级,而在一份简历中,测试标签仅来自一个班级.

当sklearn计算AUC指标时,它必须有2个类,因为获取AUC的方法只需要两个类(用所有阈值计算tpr和fpr).错误示例:

When sklearn compute the AUC metric, it must have 2 classes, because the method for getting the AUC requires only two classes (to compute tpr and fpr with all thresholds). Example of errors:

grid.fit(np.random.rand(100,2), np.random.randint(1, size=100)) #one class labels
grid.fit(np.random.rand(100,2), np.random.randint(3, size=100)) #3 class labels
#BOTH Throws same error when computing AUC

不应出现错误但可能发生错误的示例取决于简历的折叠:

Example that should not thow an error but it could happen depends of the folds of the CV:

grid.fit(np.random.rand(100,2), np.random.randint(2, size=100)) #two class labels
#This shouldnt throw an error 

解决方案

  • 如果您有 2 个以上的类:您必须手动计算(或者可能有一些库,但我不知道它),1 类 vs 全部类,其中您使用 2 个类(一个类 vs 全部类)计算 auc其他),或 All vs All AUC(成对 AUC,你计算一个类 vs ALL 一次一个类,然后计算平均值).
  • 如果您有 2 个课程:grid = GridSearchCV(pipe, param_grid= param_grid, cv = StratifiedKFold(), n_jobs=1, score = 'roc_auc')

这篇关于评分“roc_auc"值不适用于 gridsearchCV 应用 RandomForestclassifer的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆