Python ValueError:ColumnTransformer,列顺序不相等 [英] Python ValueError : ColumnTransformer, Column Ordering is Not Equal
问题描述
我组合了以下函数,这些函数读取csv,训练模型并预测请求数据。
I put together the following function that read csv, train the model and predict the request data.
我有以下ValueError:列顺序必须相等
I've got the following ValueError : Column ordering must be equal for fit and for transform when using the remainder keyword
训练数据和用于预测的数据具有完全相同的列数,例如15。我不确定
The training data and the data used for prediction has exact the same number of column , e.g., 15. I am not sure how the "ordering" of the column could have changed.
~/.local/lib/python3.5/site-packages/sklearn/pipeline.py in predict(self, X, **predict_params)
417 Xt = X
418 for _, name, transform in self._iter(with_final=False):
--> 419 Xt = transform.transform(Xt)
420 return self.steps[-1][-1].predict(Xt, **predict_params)
421
~/.local/lib/python3.5/site-packages/sklearn/compose/_column_transformer.py in transform(self, X)
581 if (n_cols_transform >= n_cols_fit and
582 any(X.columns[:n_cols_fit] != self._df_columns)):
--> 583 raise ValueError('Column ordering must be equal for fit '
584 'and for transform when using the '
585 'remainder keyword')
ValueError: Column ordering must be equal for fit and for transform when using the remainder keyword
Function:
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())])
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('onehot', OneHotEncoder(handle_unknown='ignore'))])
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, categorical_features)])
#Putting data transformation and the model in a pipeline
rf = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', RandomForestClassifier(
n_estimators=500,
criterion="gini",
max_features="sqrt",
min_samples_leaf=4))])
rf.fit(X_train, y_train)
request_data = {'A': [request.A],
'B': [request.B],
'C': [request.C],
'D': [request.D],
'E': [request.E],
'F': [request.F],
'G': [request.G],
'H': [request.H],
'I': [request.I],
'J': [request.J],
'K': [request.K],
'L': [request.L],
'M': [request.M],
'N': [request.N],
'O': [request.O]}
df_resp = pd.DataFrame(data=request_data)
response = rf.predict(df_resp)
output = {"Safety Rating": response[0]}
return output
推荐答案
从错误消息中我了解到的是 X_train.columns
和 df_resp.columns
是不同的,但 .predict()
需要它们。
What I understand from the error message is that X_train.columns
and df_resp.columns
are not the same but .predict()
needs them to be.
为了强制执行此相等操作您可以在创建数据框时将 X_train
的列列表作为参数传递:
In order to force this equality you could pass the column list of X_train
as an argument when creating the dataframe:
pd.DataFrame(data = request_data,columns = X_train.columns)
这篇关于Python ValueError:ColumnTransformer,列顺序不相等的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!