并非所有类别都存在时的虚拟变量 [英] Dummy variables when not all categories are present

查看：88 发布时间：2020/5/4 8:50:10 python pandas machine-learning dummy-variable

本文介绍了并非所有类别都存在时的虚拟变量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一组数据框，其中的一列包含分类变量.我想将其转换为几个虚拟变量，在这种情况下，通常使用get_dummies.

I have a set of dataframes where one of the columns contains a categorical variable. I'd like to convert it to several dummy variables, in which case I'd normally use get_dummies.

发生的事情是，get_dummies查看每个数据帧中的可用数据，以找出有多少类别，从而创建适当数量的虚拟变量.但是，在我现在正在解决的问题中，我实际上实际上预先知道了可能的类别.但是，当单独查看每个数据框时，不一定会出现所有类别.

What happens is that get_dummies looks at the data available in each dataframe to find out how many categories there are, and thus create the appropriate number of dummy variables. However, in the problem I'm working right now, I actually know in advance what the possible categories are. But when looking at each dataframe individually, not all categories necessarily appear.

我的问题是:是否有一种方法可以将类别名称传递给get_dummies(或等效函数)，以便对于未出现在给定数据框中的类别，只需创建一列0?

My question is: is there a way to pass to get_dummies (or an equivalent function) the names of the categories, so that, for the categories that don't appear in a given dataframe, it'd just create a column of 0s?

可以做到这一点的东西:

Something that would make this:

categories = ['a', 'b', 'c']

   cat
1   a
2   b
3   a

成为这个:

  cat_a  cat_b  cat_c
1   1      0      0
2   0      1      0
3   1      0      0

并非所有类别都存在时的虚拟变量 [英] Dummy variables when not all categories are present

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

并非所有类别都存在时的虚拟变量 [英] Dummy variables when not all categories are present

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭