sklearn中估算器管道的无效参数clf [英] Invalid parameter clf for estimator Pipeline in sklearn

查看:188
本文介绍了sklearn中估算器管道的无效参数clf的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以检查以下代码的问题吗? 我在建立模型的任何步骤中都错了吗? 我已经在参数中添加了两个'clf __'.

Could anyone check problems with the following code? Am I wrong in any steps in building my model? I already added two 'clf__' to parameters.

clf=RandomForestClassifier()
pca = PCA()
pca_clf = make_pipeline(pca, clf) 


kfold = KFold(n_splits=10, random_state=22)



parameters = {'clf__n_estimators': [4, 6, 9], 'clf__max_features': ['log2', 
'sqrt','auto'],'clf__criterion': ['entropy', 'gini'], 'clf__max_depth': [2, 
 3, 5, 10], 'clf__min_samples_split': [2, 3, 5],
'clf__min_samples_leaf': [1,5,8] }

grid_RF=GridSearchCV(pca_clf,param_grid=parameters,
        scoring='accuracy',cv=kfold)
grid_RF = grid_RF.fit(X_train, y_train)
clf = grid_RF.best_estimator_
clf.fit(X_train, y_train)
grid_RF.best_score_

cv_result = cross_val_score(clf,X_train,y_train, cv = kfold,scoring = 
"accuracy")

cv_result.mean()

推荐答案

您以错误的方式假定了make_pipeline的用法.来自文档:-

You are assuming the usage of make_pipeline in a wrong way. From the documentation:-

这是Pipeline构造函数的简写;它不需要, 并且不允许命名估算器.相反,他们的名字会 会自动设置为其类型的小写字母.

This is a shorthand for the Pipeline constructor; it does not require, and does not permit, naming the estimators. Instead, their names will be set to the lowercase of their types automatically.

因此,这意味着当您提供PCA对象时,其名称将设置为"pca"(小写),当您向其提供RandomForestClassifier对象时,它将被命名为"randomforestclassifier",而不是"clf",即你在想.

So that means that when you supply a PCA object, its name will be set as 'pca' (lowercase) and when you supply a RandomForestClassifier object to it, it will be named as 'randomforestclassifier', not 'clf' as you are thinking.

因此,现在您创建的参数网格无效,因为它包含clf__并且不在管道中.

So now the parameter grid you have made is invalid, because it contains clf__ and its not present in pipeline.

替换此行:

pca_clf = make_pipeline(pca, clf) 

使用

pca_clf = Pipeline([('pca', pca), ('clf', clf)])

解决方案2:

如果您不想更改pca_clf = make_pipeline(pca, clf)行,则将parameters中所有出现的clf替换为'randomforestclassifier',如下所示:

Solution 2 :

If you dont want to change the pca_clf = make_pipeline(pca, clf) line, then replace all the occurences of clf inside your parameters to 'randomforestclassifier' like this:

parameters = {'randomforestclassifier__n_estimators': [4, 6, 9], 
              'randomforestclassifier__max_features': ['log2', 'sqrt','auto'],
              'randomforestclassifier__criterion': ['entropy', 'gini'], 
              'randomforestclassifier__max_depth': [2, 3, 5, 10], 
              'randomforestclassifier__min_samples_split': [2, 3, 5],
              'randomforestclassifier__min_samples_leaf': [1,5,8] }

边注:无需在代码中执行此操作:

Sidenote: No need to do this in your code:

clf = grid_RF.best_estimator_
clf.fit(X_train, y_train)

best_estimator_已经适合具有最佳参数的整个数据,因此调用clf.fit()是多余的.

The best_estimator_ will already be fitted with the whole data with best found params, so you calling clf.fit() is redundant.

这篇关于sklearn中估算器管道的无效参数clf的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆