sklearn管道转换ValueError期望值不等于训练值 [英] sklearn pipeline transform ValueError that Expected Value is not equal to Trained Value
问题描述
请提供以下函数,让我遇到ValueError错误:使用剩余关键字时列排序必须相等,以适合拟合和变换
Can you please help me to with the following function where I got the error of ValueError: Column ordering must be equal for fit and for transform when using the remainder keyword
(该函数在我保存在GCP存储中的腌制sklearn管道上调用。)
(The function is called on a pickled sklearn pipeline that I had saved in GCP Storage.)
错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-192-c6a8bc0ab221> in <module>
----> 1 safety_project_lite(request)
<ipython-input-190-24c565131f14> in safety_project_lite(request)
31
32 df_resp = pd.DataFrame(data=request_data)
---> 33 response = loaded_model.predict(df_resp)
34
35 output = {"Safety Rating": response[0]}
~/.local/lib/python3.5/site-packages/sklearn/utils/metaestimators.py in <lambda>(*args, **kwargs)
114
115 # lambda, but not partial, allows help() to work with update_wrapper
--> 116 out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
117 # update the docstring of the returned function
118 update_wrapper(out, self.fn)
~/.local/lib/python3.5/site-packages/sklearn/pipeline.py in predict(self, X, **predict_params)
417 Xt = X
418 for _, name, transform in self._iter(with_final=False):
--> 419 Xt = transform.transform(Xt)
420 return self.steps[-1][-1].predict(Xt, **predict_params)
421
~/.local/lib/python3.5/site-packages/sklearn/compose/_column_transformer.py in transform(self, X)
581 if (n_cols_transform >= n_cols_fit and
582 any(X.columns[:n_cols_fit] != self._df_columns)):
--> 583 raise ValueError('Column ordering must be equal for fit '
584 'and for transform when using the '
585 'remainder keyword')
ValueError: Column ordering must be equal for fit and for transform when using the remainder keyword
Code:
def safety_project_lite_beta(request):
client = storage.Client(request.GCP_Project)
bucket = client.get_bucket(request.GCP_Bucket)
blob = bucket.blob(request.GCP_Path)
model_file = BytesIO()
blob.download_to_file(model_file)
loaded_model = pickle.loads(model_file.getvalue())
request_data = {'A': [request.A],
'B': [request.B],
'C': [request.C],
'D': [request.D],
'E': [request.E],
'F': [request.F]}
df_resp = pd.DataFrame(data=request_data)
response = loaded_model.predict(df_resp)
output = {"Rating": response[0]}
return output
推荐答案
该模型只能预测您提供的数据的结构是否与经过训练的结构相同。
The model can only predict if the data you feed it is of the same structure as it has been trained on.
要强制以下事实,即 df_resp
与 X_train
具有相同的列,在建立数据框时传递其列的列表:
To force the fact that df_resp
has the same columns as X_train
, pass a list of its columns along when building the dataframe:
df_resp = pd.DataFrame(request_data, columns=X_train.columns)
如果该变量由于某种原因不可用,则可以对其进行腌制列列表( X_train.columns
)并在以后使用:
If that variable is for some reason not available, you could pickle its column list (X_train.columns
) and use it later:
loaded_cols = pickle.loads([...])
df_resp = pd.DataFrame(data=request_data, columns=loaded_cols)
这样可以确保工作流更加动态,例如,您可以更轻松地添加列
This ensures a more dynamic workflow where you could add columns more easily for example.
这篇关于sklearn管道转换ValueError期望值不等于训练值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!