使用pandas中的多个值从列中创建假人 [英] Create dummies from column with multiple values in pandas

查看:103
本文介绍了使用pandas中的多个值从列中创建假人的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种处理以下问题的Python方法.

I am looking for for a pythonic way to handle the following problem.

pandas.get_dummies()方法非常适合从数据框的分类列创建虚拟对象.例如,如果该列的值在['A', 'B']中,则get_dummies()创建2个虚拟变量并相应地分配0或1.

The pandas.get_dummies() method is great to create dummies from a categorical column of a dataframe. For example, if the column has values in ['A', 'B'], get_dummies() creates 2 dummy variables and assigns 0 or 1 accordingly.

现在,我需要处理这种情况.单列(称为标签")的值类似于['A', 'B', 'C', 'D', 'A*C', 'C*D']. get_dummies()创建6个虚拟变量,但我只需要4个虚拟变量,因此一行可以有多个1.

Now, I need to handle this situation. A single column, let's call it 'label', has values like ['A', 'B', 'C', 'D', 'A*C', 'C*D'] . get_dummies() creates 6 dummies, but I only want 4 of them, so that a row could have multiple 1s.

有没有办法以pythonic方式处理此问题?我只能想到一些逐步的算法来获取它,但是其中不包括get_dummies(). 谢谢

Is there a way to handle this in a pythonic way? I could only think of some step-by-step algorithm to get it, but that would not include get_dummies(). Thanks

已编辑,希望更清晰!

推荐答案

我知道距提出这个问题已经有一段时间了,但是(至少现在有[em> ) 文档支持的功能:

I know it's been a while since this question was asked, but there is (at least now there is) a one-liner that is supported by the documentation:

In [4]: df
Out[4]:
      label
0  (a, c, e)
1     (a, d)
2       (b,)
3     (d, e)

In [5]: df['label'].str.join(sep='*').str.get_dummies(sep='*')
Out[5]:
   a  b  c  d  e
0  1  0  1  0  1
1  1  0  0  1  0
2  0  1  0  0  0
3  0  0  0  1  1

这篇关于使用pandas中的多个值从列中创建假人的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆