Scikit-learn 中多标签分类的 GridSearch [英] GridSearch for Multi-label classification in Scikit-learn

查看:36
本文介绍了Scikit-learn 中多标签分类的 GridSearch的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在十折交叉验证中的每一个中对最佳超参数进行 GridSearch,它在我之前的多类分类工作中工作得很好,但这次在多标签工作中却不是这样.

I am trying to do GridSearch for best hyper-parameters in every individual one of ten folds cross validation, it worked fine with my previous multi-class classification work, but not the case this time with multi-label work.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
clf = OneVsRestClassifier(LinearSVC())

C_range = 10.0 ** np.arange(-2, 9)
param_grid = dict(estimator__clf__C = C_range)

clf = GridSearchCV(clf, param_grid)
clf.fit(X_train, y_train)

我收到错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-65-dcf9c1d2e19d> in <module>()
      6 
      7 clf = GridSearchCV(clf, param_grid)
----> 8 clf.fit(X_train, y_train)

/usr/local/lib/python2.7/site-packages/sklearn/grid_search.pyc in fit(self, X, y)
    595 
    596         """
--> 597         return self._fit(X, y, ParameterGrid(self.param_grid))
    598 
    599 

/usr/local/lib/python2.7/site-packages/sklearn/grid_search.pyc in _fit(self, X, y,   
parameter_iterable)
    357                                  % (len(y), n_samples))
    358             y = np.asarray(y)
--> 359         cv = check_cv(cv, X, y, classifier=is_classifier(estimator))
    360 
    361         if self.verbose > 0:

/usr/local/lib/python2.7/site-packages/sklearn/cross_validation.pyc in _check_cv(cv, X,  
y, classifier, warn_mask)
   1365             needs_indices = None
   1366         if classifier:
-> 1367             cv = StratifiedKFold(y, cv, indices=needs_indices)
   1368         else:
   1369             if not is_sparse:

/usr/local/lib/python2.7/site-packages/sklearn/cross_validation.pyc in __init__(self, 
y, n_folds, indices, shuffle, random_state)
    427         for test_fold_idx, per_label_splits in enumerate(zip(*per_label_cvs)):
    428             for label, (_, test_split) in zip(unique_labels, per_label_splits):
--> 429                 label_test_folds = test_folds[y == label]
    430                 # the test split can be too big because we used
    431                 # KFold(max(c, self.n_folds), self.n_folds) instead of

ValueError: boolean index array should have 1 dimension

可能是指标签指示器的维度或格式.

Which might refer to the dimension or the format of the label indicator.

print X_train.shape, y_train.shape

获取:

(147, 1024) (147, 6)

似乎GridSearch 固有地实现了StratifiedKFold.该问题出现在具有多标签问题的分层K-fold策略中.

Seems GridSearch implements StratifiedKFold inherently. The problem raises in the stratified K-fold strategy with multi-label problem.

StratifiedKFold(y_train, 10)

给予

ValueError                                Traceback (most recent call last)
<ipython-input-87-884ffeeef781> in <module>()
----> 1 StratifiedKFold(y_train, 10)

/usr/local/lib/python2.7/site-packages/sklearn/cross_validation.pyc in __init__(self,   
y, n_folds, indices, shuffle, random_state)
    427         for test_fold_idx, per_label_splits in enumerate(zip(*per_label_cvs)):
    428             for label, (_, test_split) in zip(unique_labels, per_label_splits):
--> 429                 label_test_folds = test_folds[y == label]
    430                 # the test split can be too big because we used
    431                 # KFold(max(c, self.n_folds), self.n_folds) instead of

ValueError: boolean index array should have 1 dimension

目前使用传统的 K 折策略效果很好.有没有什么方法可以实现分层 K-fold 到多标签分类?

Current use of conventional K-fold strategy works fine. Is there any method to implement stratified K-fold to multi-label classification?

推荐答案

网格搜索执行 分层交叉验证用于分类问题,但对于多标签任务没有实现;事实上,多标签分层是机器学习中一个未解决的问题.我最近遇到了同样的问题,我能找到的所有文献都是 这篇文章(该文章的作者表示他们也找不到任何其他尝试来解决这个问题).

Grid search performs stratified cross-validation for classification problems, but for multi-label tasks this is not implemented; in fact, multi-label stratification is an unsolved problem in machine learning. I recently faced the same issue, and all the literature that I could find was a proposed method in this article (the authors of which state that they couldn't find any other attempts at solving this either).

这篇关于Scikit-learn 中多标签分类的 GridSearch的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆