从pandas.get_dummies转换到新数据的简单方法? [英] Easy way to apply transformation from `pandas.get_dummies` to new data?
问题描述
假设我有一个数据框data
,其中包含要转换为指标的字符串.我使用pandas.get_dummies(data)
将其转换为现在可用于构建模型的数据集.
Suppose I have a data frame data
with strings that I want converted to indicators. I use pandas.get_dummies(data)
to convert this to a dataset that I can now use for building a model.
现在,我有一个新的观察值,我想遍历我的模型.显然,我不能使用pandas.get_dummies(new_data)
,因为它不包含所有类,并且不会创建相同的指标矩阵.有什么好方法吗?
Now I have a single new observation that I want to run through my model. Obviously I can't use pandas.get_dummies(new_data)
because it doesn't contain all of the classes and won't make the same indicator matrices. Is there a good way to do this?
推荐答案
您可以根据单个新观察值创建虚拟对象,然后使用原始指标矩阵中的列重新索引这些框架列:
you can create the dummies from the single new observation, and then reindex this frames columns using the columns from the original indicator matrix:
import pandas as pd
df = pd.DataFrame({'cat':['a','b','c','d'],'val':[1,2,5,10]})
df1 = pd.get_dummies(pd.DataFrame({'cat':['a'],'val':[1]}))
dummies_frame = pd.get_dummies(df)
df1.reindex(columns = dummies_frame.columns, fill_value=0)
返回:
val cat_a cat_b cat_c cat_d
0 1 1 0 0 0
这篇关于从pandas.get_dummies转换到新数据的简单方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!