TypeError网格搜索 [英] TypeError grid search

查看:110
本文介绍了TypeError网格搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我曾经创建循环来为模型寻找最佳参数,这增加了我在编码中的错误,因此我决定使用GridSearchCV.
我正在尝试为我的模型找出PCA的最佳参数(我要在其上进行网格搜索的唯一参数).
在此模型中,归一化后,我想将原始特征与PCA简化特征结合起来,然后应用线性SVM.
然后,我保存整个模型以预测我的输入.

我在尝试拟合数据的行中出现错误,因此可以使用best_estimator_best_params_函数.
该错误说:TypeError: The score function should be a callable, all (<type 'str'>) was passed.我没有使用任何可能需要在GridSearchCV中提供字符串的参数,所以不确定为什么我会出现此错误

我还想知道保存模型之前的行print("shape after model",X.shape)是否应该基于所有可能的参数同时打印(150, 7) and (150, 5)?

I used to create loop for finding the best parameters for my model which increased my errors in coding so I decided to use GridSearchCV.
I am trying to find out the best parameters for PCA for my model (the only parameter I want to grid search on).
In this model, after normalization I want to combine the original features with the PCA reduced features and then apply the linear SVM.
Then I save the whole model to predict my input on.

I have an error in the line where I try to fit the data so I can use best_estimator_ and best_params_ functions.
The error says: TypeError: The score function should be a callable, all (<type 'str'>) was passed. I did not use any parameters for which I might need to give string in GridSearchCVso not sure why I have this error

I also want to know if the line print("shape after model",X.shape) before saving my model, should should print (150, 7) and (150, 5) both based on all possible parameter?

from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest
from sklearn.preprocessing import StandardScaler
from sklearn.externals import joblib
from numpy import array

iris = load_iris()
X, y = iris.data, iris.target

print(X.shape) #prints (150, 4)
print (y)

#cretae models and piplline them
combined_features = FeatureUnion([("pca", PCA()), ("univ_select", SelectKBest(k='all'))])
svm = SVC(kernel="linear")

pipeline = Pipeline([("scale", StandardScaler()),("features", combined_features), ("svm", svm)])

# Do grid search over n_components:
param_grid = dict(features__pca__n_components=[1,3])

grid_search = GridSearchCV(pipeline, param_grid=param_grid, cv=5, verbose=10)
grid_search.fit(X, y)
print("best parameters", grid_search.best_params_)

print("shape after model",X.shape) #should this print (150, 7) or (150, 5) based on best parameter?

#save the model
joblib.dump(grid_search.best_estimator_, 'model.pkl', compress = 1)

#new data to predict
Input=[ 2.9 , 4.  ,1.2  ,0.2]
Input= array(Input)

#use the saved model to predict the new data
modeltrain="model.pkl"
modeltrain_saved = joblib.load(modeltrain) 
model_predictions = modeltrain_saved.predict(Input.reshape(1, -1))
print(model_predictions)

我根据答案更新了代码

推荐答案

您正在提供'all'作为SelectKBest中的参数.但是根据文档,如果要传递全部",则需要将其指定为:

You are supplying 'all' as a param in SelectKBest. But according to the documentation, if you want to pass 'all', you need to specify it as:

SelectKBest(k='all')

原因是它是一个关键字参数,应该用关键字指定.因为SelectKBest的第一个参数是评分功能的位置参数.因此,当您不指定param时,'all'被认为是函数的输入,因此是错误.

The reason is that its a keyword argument, it should be specified with the keyword. Because the first argument to SelectKBest is a positional argument for the scoring function. So when you do not specify the param, 'all' is considered an input for the function and hence the error.

更新:

关于形状,现在的X将不会更改.因此它将打印(150,4).数据将即时更改,而在我的电脑上,best_param_n_components=1,因此最终到达svm的形状是(150, 5),PCA中为1,SelectKBest中为4.

Now about the shape, the original X will not be changed. So it will print (150,4). The data will be changed on the fly and on my pc the best_param_ is n_components=1, so final shape that goes to svm is (150, 5), 1 from PCA and 4 from SelectKBest.

这篇关于TypeError网格搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆