pandas 在多列上的get_dummies [英] Pandas get_dummies on multiple columns

查看:151
本文介绍了 pandas 在多列上的get_dummies的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含多个列的数据集,我希望对其进行一次热编码.但是,我不想为每个编码都有编码,因为所说的列与所说的项目有关.我想要的是一组使用所有列的虚拟变量.请参阅我的代码以获得更好的解释.

I have a dataset with multiple columns that I wish to one hot encode. However, I don't want to have the encoding for each one of them since said columns are related to the said items. What I want is one "set" of dummies variables that uses all the columns. See my code for a better explanation.

假设我的数据框如下所示:

Suppose my dataframe looks like this:

In [103]: dum = pd.DataFrame({'ch1': ['A', 'C', 'A'], 'ch2': ['B', 'G', 'F'], 'ch3': ['C', 'D', 'E']})

In [104]: dum
Out[104]:
 ch1 ch2 ch3
0   A   B   C
1   C   G   D
2   A   F   E

如果我执行

pd.get_dummies(dum)

输出将是

   ch1_A  ch1_C  ch2_B  ch2_F  ch2_G  ch3_C  ch3_D  ch3_E
 0      1      0      1      0      0      1      0      0
 1      0      1      0      0      1      0      1      0
 2      1      0      0      1      0      0      0      1

但是,我想获得的是这样的东西:

However, what I would like to obtain is something like this:

 A B C D E F G
 1 1 1 0 0 0 0
 0 0 1 1 0 0 1
 1 0 0 0 1 1 0

而不是用多列表示编码,例如ch1_Ach1_C,当列ch1ch2ch3出现.

Instead of having multiple columns representing the encoding, e.g. ch1_A and ch1_C, I only wish to have one group (A, B, and so on) with value 1 when any of the values in the columns ch1, ch2, ch3 show up.

为澄清起见,在我的原始数据集中,单行不会多次包含相同的值(A,B,C ...);它只会出现在其中一列上.

To clarify, in my original dataset, a single row won't contain the same value (A,B,C...) more than once; it will just appear on one of the columns.

推荐答案

使用stackstr.get_dummies

dum.stack().str.get_dummies().sum(level=0)
Out[938]: 
   A  B  C  D  E  F  G
0  1  1  1  0  0  0  0
1  0  0  1  1  0  0  1
2  1  0  0  0  1  1  0

这篇关于 pandas 在多列上的get_dummies的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆