如何在培训/验证/测试中对齐pandas get_dummies? [英] How can I align pandas get_dummies across training / validation / testing?

查看：108 发布时间：2020/5/24 2:24:08 python pandas one-hot-encoding

本文介绍了如何在培训/验证/测试中对齐pandas get_dummies?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有3组数据(培训，验证和测试)，当我运行时:

I have 3 sets of data (training, validation and testing) and when I run:

    training_x = pd.get_dummies(training_x, columns=['a', 'b', 'c'])

它为我提供了一定数量的功能.但是，当我在验证数据上运行它时，它给了我一个不同的数字，并且用于测试是相同的.有什么方法可以对所有数据集进行规范化(我知道错吗?)，以便使特征数量对齐?

It gives me a certain number of features. But then when I run it across validation data, it gives me a different number and the same for testing. Is there any way to normalize (wrong word, I know) across all data sets so the number of features aligns?

推荐答案

假人应在将数据集分为训练，测试或验证之前创建

dummies should be created before dividing the dataset into train, test or validate

假设我具有如下训练和测试数据框

suppose i have train and test dataframe as follows

import pandas as pd  
train = pd.DataFrame([1,2,3], columns= ['A'])
test= pd.DataFrame([7,8], columns= ['A'])

#creating dummy for train 
pd.get_dummies(train, columns= ['A'])

o/p
   A_1  A_2  A_3  A_4  A_5  A_6
0    1    0    0    0    0    0
1    0    1    0    0    0    0
2    0    0    1    0    0    0
3    0    0    0    1    0    0
4    0    0    0    0    1    0
5    0    0    0    0    0    1



# creating dummies for test data
pd.get_dummies(test, columns = ['A'])
    A_7  A_8
0    1    0
1    0    1

因此7和8类的虚拟对象仅会出现在测试中，因此结果将具有不同的功能

so dummy for 7 and 8 category will only be present in test and thus will result with different feature

final_df = pd.concat([train, test]) 

dummy_created = pd.get_dummies(final_df)

# now you can split it into train and test 
from sklearn.model_selection import train_test_split
train_x, test_x = train_test_split(dummy_created, test_size=0.33)

现在的火车和测试将具有相同的功能

Now train and test will have same set of features

这篇关于如何在培训/验证/测试中对齐pandas get_dummies?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在培训/验证/测试中对齐pandas get_dummies? [英] How can I align pandas get_dummies across training / validation / testing?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在培训/验证/测试中对齐pandas get_dummies? [英] How can I align pandas get_dummies across training / validation / testing?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭