工作管道上的 GridSearchCV 返回 ValueError [英] GridSearchCV on a working pipeline returns ValueError

查看:46
本文介绍了工作管道上的 GridSearchCV 返回 ValueError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 GridSearchCV 来为我的管道找到最佳参数.

我的管道似乎运行良好,我可以申请:

pipeline.fit(X_train, y_train)preds = pipeline.predict(X_test)

而且我得到了不错的结果.

但 GridSearchCV 显然不喜欢某些东西,我无法弄清楚.

我的管道:

feats = FeatureUnion([('age', age),('education_num',education_num),('is_education_favo', is_education_favo),('is_marital_status_favo', is_marital_status_favo),('hours_per_week', hours_per_week),('capital_diff', capital_diff),('性别',性别),('种族',种族),('native_country', native_country)])管道 = 管道([('adhocFC',AdHocFeaturesCreation()),('imputers', KnnImputer(target = 'native-country', n_neighbors = 5)),('features',feats),('clf',LogisticRegression())])

我的网格搜索:

hyperparameters = {'imputers__n_neighbors' : [5,21,41], 'clf__C' : [1.0, 2.0]}GSCV = GridSearchCV(pipeline, hyperparameters, cv=3, score = 'roc_auc' , refit = False) #change n_jobs = 2, refit = FalseGSCV.fit(X_train, y_train)

我收到了 11 个类似的警告:

<块引用>

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/ipykernel/ma​​in.py:11:SettingWithCopyWarning: 试图在副本上设置一个值来自 DataFrame 的切片.尝试使用 .loc[row_indexer,col_indexer] =取而代之的价值

这是错误信息:

<块引用>

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/ipykernel/ma​​in.py:11:SettingWithCopyWarning: 试图在副本上设置一个值来自 DataFrame 的切片.尝试使用 .loc[row_indexer,col_indexer] =取而代之的价值

请参阅文档中的注意事项:http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/ipykernel/ma​​in.py:12:SettingWithCopyWarning: 试图在副本上设置一个值来自 DataFrame 的切片.尝试使用 .loc[row_indexer,col_indexer] =取而代之的价值

请参阅文档中的注意事项:http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/ipykernel/ma​​in.py:14:SettingWithCopyWarning: 试图在副本上设置一个值来自 DataFrame 的切片.尝试使用 .loc[row_indexer,col_indexer] =取而代之的价值

请参阅文档中的注意事项:http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

--------------------------------------------------------------------------- ValueError 回溯(最近一次调用最后) 在 ()3 GSCV = GridSearchCV(pipeline, hyperparameters, cv=3, score = 'roc_auc' ,refit = False) #change n_jobs = 2, refit = False4----> 5 GSCV.fit(X_train, y_train)

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/model_selection/_search.py合身(自我,X,y,组)943 训练/测试集.第944章--> 945 返回 self._fit(X, y, groups, ParameterGrid(self.param_grid))946第947章

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/model_selection/_search.py在 _fit(self, X, y, groups, parameter_iterable)562 return_times=True,return_parameters=True,第563话--> 564 用于 parameter_iterable 中的参数565 用于训练,在 cv_iter 中测试)第566话

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py在调用(自我,可迭代)756#被派送.特别是这覆盖了边缘757 # Parallel 与耗尽迭代器一起使用的情况.--> 758 而 self.dispatch_one_batch(iterator):第759话760 其他:

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py在 dispatch_one_batch(self, iterator) 中第606回607 其他:--> 608 self._dispatch(tasks)第609话610

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py在 _dispatch(self, batch)569 dispatch_timestamp = time.time()570 cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self)--> 571 job = self._backend.apply_async(batch, callback=cb)第572话第573章

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.py在 apply_async(self, func, callback)107 def apply_async(自我,函数,回调=无):108 """安排一个函数运行"""--> 109 结果 = 立即结果(函数)110 如果回调:111 回调(结果)

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.py在 init(自我,批处理)324 #不要拖延申请,避免保留输入第325话--> 326 self.results = batch()327328 def get(self):

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py在通话(自己)129130 def 调用(自己):--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]132133 def len(自我):

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py在 (.0)129130 def 调用(自己):--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]132133 def len(自我):

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/model_selection/_validation.py在 _fit_and_score(estimator, X, y, scorer, train, test, verbose,参数,fit_params,return_train_score,return_parameters,return_n_test_samples, return_times, error_score)第236话237 其他:--> 238 estimator.fit(X_train, y_train, **fit_params)239240 除例外为 e:

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/pipeline.py适合(自我,X,y,**fit_params)第266章 这个估计第267话--> 268 Xt, fit_params = self._fit(X, y, **fit_params)269 如果 self._final_estimator 不是 None:270 self._final_estimator.fit(Xt, y, **fit_params)

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/pipeline.py在 _fit(self, X, y, **fit_params)第232关233 elif hasattr(转换,fit_transform"):--> 234 Xt = transform.fit_transform(Xt, y, **fit_params_steps[name])235 其他:236 Xt = transform.fit(Xt, y, **fit_params_steps[name]) \

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/base.py在 fit_transform(self, X, y, **fit_params)495 其他:496 # arity 2的拟合方法(监督变换)--> 497 返回 self.fit(X, y, **fit_params).transform(X)498第499话

in fit(self, X, y)16 self.ohe.fit(X_full)17 #创建一个不包含任何空值的Dataframe,categ变量是OHE,每一行都有---> 18 X_ohe_full = self.ohe.transform(X_full[~X[self.col].isnull()].drop(self.col,轴=1))1920 #在col为空的行上拟合分类器

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/pandas/core/frame.py在 getitem(self, key) 2057 返回self._getitem_multilevel(key) 2058 else:-> 2059 返回 self._getitem_column(key) 2060 2061 def _getitem_column(self, key):

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/pandas/core/frame.pyin _getitem_column(self, key) 2064 # 获取列 2065
如果 self.columns.is_unique:-> 2066 return self._get_item_cache(key) 2067 2068 # 重复列 &可能的降维

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/pandas/core/generic.py在 _get_item_cache(self, item) 1384 res = cache.get(item)
1385 如果 res 为 None:-> 1386 值 = self._data.get(item) 1387 res = self._box_item_values(item, values) 1388
缓存[项目] = res

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/pandas/core/internals.py在 get(self, item, fastpath) 3550 loc =indexer.item() 3551 否则:-> 3552 raise ValueError("cannot label index with a null key") 3553 3554 return self.iget(loc,快速路径=快速路径)

ValueError: 不能用空键标记索引

解决方案

没有附加信息我相信是因为你的 X_trainy_train 变量是 pandas 数据框,基本的sci-kit 学习库无法与这些相比:例如,.fit 分类器的方法需要一个类似对象的数组.

通过输入 pandas 数据帧,您会在不经意间像 numpy 数组一样索引它们,这在 熊猫.

尝试将您的训练数据转换为 numpy 数组:

X_train_arr = X_train.to_numpy()y_train_arr = y_train.to_numpy()

I am using GridSearchCV in order to find the best parameters for my pipeline.

My pipeline seems to work well as I can apply:

pipeline.fit(X_train, y_train)
preds = pipeline.predict(X_test)

And I get a decent result.

But GridSearchCV obviously doesn't like something, and I cannot figure it out.

My pipeline:

feats = FeatureUnion([('age', age),
                      ('education_num', education_num),
                      ('is_education_favo', is_education_favo),
                      ('is_marital_status_favo', is_marital_status_favo),
                      ('hours_per_week', hours_per_week),
                      ('capital_diff', capital_diff),
                      ('sex', sex),
                      ('race', race),
                      ('native_country', native_country)
                     ])

pipeline = Pipeline([
        ('adhocFC',AdHocFeaturesCreation()),
        ('imputers', KnnImputer(target = 'native-country', n_neighbors = 5)),
        ('features',feats),('clf',LogisticRegression())])

My GridSearch:

hyperparameters = {'imputers__n_neighbors' : [5,21,41], 'clf__C' : [1.0, 2.0]}

GSCV = GridSearchCV(pipeline, hyperparameters, cv=3, scoring = 'roc_auc' , refit = False) #change n_jobs = 2, refit = False

GSCV.fit(X_train, y_train)

I receive 11 similar warnings:

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/ipykernel/main.py:11: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

and this is the error message:

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/ipykernel/main.py:11: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy /home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/ipykernel/main.py:12: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy /home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/ipykernel/main.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () 3 GSCV = GridSearchCV(pipeline, hyperparameters, cv=3, scoring = 'roc_auc' ,refit = False) #change n_jobs = 2, refit = False 4 ----> 5 GSCV.fit(X_train, y_train)

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups) 943 train/test set. 944 """ --> 945 return self._fit(X, y, groups, ParameterGrid(self.param_grid)) 946 947

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/model_selection/_search.py in _fit(self, X, y, groups, parameter_iterable) 562 return_times=True, return_parameters=True, 563 error_score=self.error_score) --> 564 for parameters in parameter_iterable 565 for train, test in cv_iter) 566

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in call(self, iterable) 756 # was dispatched. In particular this covers the edge 757 # case of Parallel used with an exhausted iterator. --> 758 while self.dispatch_one_batch(iterator): 759 self._iterating = True 760 else:

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self, iterator) 606 return False 607 else: --> 608 self._dispatch(tasks) 609 return True 610

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in _dispatch(self, batch) 569 dispatch_timestamp = time.time() 570 cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self) --> 571 job = self._backend.apply_async(batch, callback=cb) 572 self._jobs.append(job) 573

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.py in apply_async(self, func, callback) 107 def apply_async(self, func, callback=None): 108 """Schedule a func to be run""" --> 109 result = ImmediateResult(func) 110 if callback: 111 callback(result)

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/_parallel_backends.py in init(self, batch) 324 # Don't delay the application, to avoid keeping the input 325 # arguments in memory --> 326 self.results = batch() 327 328 def get(self):

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in call(self) 129 130 def call(self): --> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items] 132 133 def len(self):

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py in (.0) 129 130 def call(self): --> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items] 132 133 def len(self):

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/model_selection/_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, error_score) 236 estimator.fit(X_train, **fit_params) 237 else: --> 238 estimator.fit(X_train, y_train, **fit_params) 239 240 except Exception as e:

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/pipeline.py in fit(self, X, y, **fit_params) 266 This estimator 267 """ --> 268 Xt, fit_params = self._fit(X, y, **fit_params) 269 if self._final_estimator is not None: 270 self._final_estimator.fit(Xt, y, **fit_params)

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/pipeline.py in _fit(self, X, y, **fit_params) 232 pass 233 elif hasattr(transform, "fit_transform"): --> 234 Xt = transform.fit_transform(Xt, y, **fit_params_steps[name]) 235 else: 236 Xt = transform.fit(Xt, y, **fit_params_steps[name]) \

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/sklearn/base.py in fit_transform(self, X, y, **fit_params) 495 else: 496 # fit method of arity 2 (supervised transformation) --> 497 return self.fit(X, y, **fit_params).transform(X) 498 499

in fit(self, X, y) 16 self.ohe.fit(X_full) 17 #Create a Dataframe that does not contain any nulls, categ variables are OHE, with all each rows ---> 18 X_ohe_full = self.ohe.transform(X_full[~X[self.col].isnull()].drop(self.col, axis=1)) 19 20 #Fit the classifier on lines where col is null

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/pandas/core/frame.py in getitem(self, key) 2057 return self._getitem_multilevel(key) 2058 else: -> 2059 return self._getitem_column(key) 2060 2061 def _getitem_column(self, key):

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/pandas/core/frame.py in _getitem_column(self, key) 2064 # get column 2065
if self.columns.is_unique: -> 2066 return self._get_item_cache(key) 2067 2068 # duplicate columns & possible reduce dimensionality

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/pandas/core/generic.py in _get_item_cache(self, item) 1384 res = cache.get(item)
1385 if res is None: -> 1386 values = self._data.get(item) 1387 res = self._box_item_values(item, values) 1388
cache[item] = res

/home/jo/anaconda2/envs/py35/lib/python3.5/site-packages/pandas/core/internals.py in get(self, item, fastpath) 3550 loc = indexer.item() 3551 else: -> 3552 raise ValueError("cannot label index with a null key") 3553 3554 return self.iget(loc, fastpath=fastpath)

ValueError: cannot label index with a null key

解决方案

Without additional information I believe it is because your X_train and y_train variables are pandas dataframe, the basic sci-kit learn library isn't comparable with these: e.g., the .fit method of a classifier is expecting an array like object.

By feeding in pandas dataframes you are inadvertently indexing them like numpy arrays, which is not that stable in pandas.

Try converting your training data to numpy arrays:

X_train_arr = X_train.to_numpy()
y_train_arr = y_train.to_numpy()

这篇关于工作管道上的 GridSearchCV 返回 ValueError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆