通过对一个热编码数据进行训练的模型来预测新值 [英] predicitng new value through a model trained on one hot encoded data

查看:45
本文介绍了通过对一个热编码数据进行训练的模型来预测新值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这看起来像是一个小问题.但是,我无法预测模型的结果.我的问题是这样的:

This might look like a trivial problem. But I am getting stuck in predicting results from a model. My problem is like this:

我有一个形状为1000 x 19(目标特征除外)的数据集,但是经过一轮热编码后,它变成了1000 x 141.由于我是根据形状为1000 x 141的数据训练模型的,因此我需要(至少)形状为1 x 141的数据进行预测.我也知道在python中,我可以使用进行未来的预测

I have a dataset of shape 1000 x 19 (except target feature) but after one hot encoding it becomes 1000 x 141. Since I trained the model on the data which is of shape 1000 x 141, so I need data of shape 1 x 141 (at least) for prediction. I also know in python, I can make future prediction using

model.predict(data)

但是,由于我要通过形状为1 x 19的Web门户从最终用户那里获取数据,所以现在我很困惑应该如何进一步根据用户数据进行预测.

But, since I am getting data from an end user through a web portal which is shape of 1 x 19. Now I am very confused how should I proceed further to make predictions based on the user data.

如何将形状为1 x 19的数据转换为1 x 141,因为我必须相对于训练/测试数据保持相同的顺序,这意味着列的顺序不应有所不同?在这方面的任何帮助将不胜感激.

How can I convert data of shape 1 x 19 into 1 x 141 as I have to maintain the same order with respect to train/test data means the order of column should not differ? Any help in this direction would be highly appreciated.

推荐答案

我假设要创建一个单一的热编码,您正在使用sklearn onehotencoder.如果使用它,则该问题应该很容易解决.由于您正在训练数据上安装一个热编码器

I am assuming that to create a one hot encoding, you are using sklearn onehotencoder. If you using that, then the problem should be solved easily. Since you are fitting the one hot encoder on your training data

from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder(categories = "auto", handle_unknown = "ignore")
X_train_encoded = encoder.fit_transform(X_train)

因此,现在在上面的代码中,您的编码器已安装在您的训练数据上,因此当您获得测试数据时,可以使用此已安装的编码器将其转换为相同的编码数据.

So now in the above code, your encoder is fitted on your training data so when you get the test data, you can transform it into the same encoded data using this fitted encoder.

test_data = encoder.transform(test_data)

现在,您的测试数据也将具有1x141形状.您可以使用

Now your test data will also be of 1x141 shape. You can check shape using

(pd.DataFrame(test_data.toarray())).shape

这篇关于通过对一个热编码数据进行训练的模型来预测新值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆