在几个DataFrame列上运行get_dummies? [英] Running get_dummies on several DataFrame columns?
问题描述
如何通过一个常用的方式运行一个函数,如 get_dummies
,这个函数需要一个列,并返回多个DataFrame列?
How can one idiomatically run a function like get_dummies
, which expects a single column and returns several, on multiple DataFrame columns?
推荐答案
由于熊猫版本0.15.0, pd.get_dummies
可以直接处理DataFrame(之前,它只能处理一个系列,并且见下面的解决方法):
Since pandas version 0.15.0, pd.get_dummies
can handle a DataFrame directly (before that, it could only handle a single Series, and see below for the workaround):
In [1]: df = DataFrame({'A': ['a', 'b', 'a'], 'B': ['c', 'c', 'b'],
...: 'C': [1, 2, 3]})
In [2]: df
Out[2]:
A B C
0 a c 1
1 b c 2
2 a b 3
In [3]: pd.get_dummies(df)
Out[3]:
C A_a A_b B_b B_c
0 1 1 0 0 1
1 2 0 1 0 1
2 3 1 0 1 0
大熊猫的解决方法< 0.15.0
Workaround for pandas < 0.15.0
您可以为每个列分开,然后连结结果:
You can do it for each column seperate and then concat the results:
In [111]: df
Out[111]:
A B
0 a x
1 a y
2 b z
3 b x
4 c x
5 a y
6 b y
7 c z
In [112]: pd.concat([pd.get_dummies(df[col]) for col in df], axis=1, keys=df.columns)
Out[112]:
A B
a b c x y z
0 1 0 0 1 0 0
1 1 0 0 0 1 0
2 0 1 0 0 0 1
3 0 1 0 1 0 0
4 0 0 1 1 0 0
5 1 0 0 0 1 0
6 0 1 0 0 1 0
7 0 0 1 0 0 1
如果您不想要多索引列,则从连接函数调用中删除 keys = ..
。
If you don't want the multi-index column, then remove the keys=..
from the concat function call.
这篇关于在几个DataFrame列上运行get_dummies?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!