在scikit-learn中结合使用递归特征消除和网格搜索 [英] Combining Recursive Feature Elimination and Grid Search in scikit-learn

查看:83
本文介绍了在scikit-learn中结合使用递归特征消除和网格搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在scikit-learn中结合使用递归特征消除和网格搜索.从下面的代码可以看到(有效),我能够从网格搜索中获得最佳估计器,然后将该估计器传递给RFECV.但是,我宁愿先执行RFECV,然后再进行网格搜索.问题是当我将选择器从RFECV传递到网格搜索时,它并没有接受它:

I am trying to combine recursive feature elimination and grid search in scikit-learn. As you can see from the code below (which works), I am able to get the best estimator from a grid search and then pass that estimator to RFECV. However, I would rather do the RFECV first, then the grid search. The problem is that when I pass the selector ​from RFECV to the grid search, it does not take it:

ValueError:估算器RFECV的无效参数引导程序

ValueError: Invalid parameter bootstrap for estimator RFECV

是否可以从RFECV中获取选择器并将其直接传递给RandomizedSearchCV,或者这在程序上不是正确的选择吗?

Is it possible to get the selector from RFECV and pass it directly to RandomizedSearchCV, or is this procedurally not the right thing to do?

from sklearn.datasets import make_classification
from sklearn.feature_selection import RFECV
from sklearn.grid_search import GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from scipy.stats import randint as sp_randint

# Build a classification task using 3 informative features
X, y = make_classification(n_samples=1000, n_features=25, n_informative=5, n_redundant=2, n_repeated=0, n_classes=8, n_clusters_per_class=1, random_state=0)

grid = {"max_depth": [3, None],
        "min_samples_split": sp_randint(1, 11),
        "min_samples_leaf": sp_randint(1, 11),
        "bootstrap": [True, False],
        "criterion": ["gini", "entropy"]}

estimator = RandomForestClassifierCoef()
clf = RandomizedSearchCV(estimator, param_distributions=grid, cv=7)
clf.fit(X, y)
estimator = clf.best_estimator_

selector = RFECV(estimator, step=1, cv=4)
selector.fit(X, y)
selector.grid_scores_

推荐答案

做到这一点的最佳方法是使用

The best way to do this would be to nest the RFECV inside the random search, using the method from this SO answer. Some example code, based on the question code and the SO answer mentioned above:

from sklearn.datasets import make_classification
from sklearn.feature_selection import RFECV
from sklearn.grid_search import GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from scipy.stats import randint as sp_randint

# Build a classification task using 5 informative features
X, y = make_classification(n_samples=1000, n_features=25, n_informative=5, n_redundant=2, n_repeated=0, n_classes=8, n_clusters_per_class=1, random_state=0)

grid = {"estimator__max_depth": [3, None],
        "estimator__min_samples_split": sp_randint(1, 11),
        "estimator__min_samples_leaf": sp_randint(1, 11),
        "estimator__bootstrap": [True, False],
        "estimator__criterion": ["gini", "entropy"]}

estimator = RandomForestClassifier()
selector = RFECV(estimator, step=1, cv=4)
clf = RandomizedSearchCV(selector, param_distributions=grid, cv=7)
clf.fit(X, y)
print(clf.grid_scores_)
print(clf.best_estimator_.n_features_)

这篇关于在scikit-learn中结合使用递归特征消除和网格搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆