pandas get_dummies输出dtype整数/布尔值而不是float [英] Pandas get_dummies to output dtype integer/bool instead of float
问题描述
我想知道是否可以要求pandas中的get_dummies函数以比默认float64更轻的dtype输出假人数据帧.
I would like to know if could ask the get_dummies function in pandas to output the dummies dataframe with a dtype lighter than the default float64.
因此,对于具有分类列的示例数据框:
So, for a sample dataframe with categorical columns:
In []: df = pd.DataFrame([(blue,wood),(blue,metal),(red,wood)],
columns=['C1','C2'])
In []: df
Out[]:
C1 C2
0 blue wood
1 blue metal
2 red wood
得到假人后,它看起来像:
after getting the dummies, it looks like:
In []: df = pd.get_dummies(df)
In []: df
Out[]:
C1_blue C1_red C2_metal C2_wood
0 1 0 0 1
1 1 0 1 0
2 0 1 0 1
这很好.但是,默认情况下,1和0为float64:
which is perfectly fine. However, by default the 1's and 0's are float64:
In []: df.dtypes
Out[]:
C1_blue float64
C1_red float64
C2_metal float64
C2_wood float64
dtype: object
我知道以后可以使用astype
更改dtype:
I know I can change the dtype afterwards with astype
:
In []: df = pd.get_dummies(df).astype(np.int8)
但是我不想在内存中有浮点数的数据框,因为我正在处理一个大的数据框(来自大约5Gb的csv).我想直接将假人作为整数.
But I don't want to have the dataframe with floats in memory, because I am dealing with a big dataframe (from a csv of about ~5Gb). I would like to have the dummies directly as integers.
推荐答案
浮动问题现已解决.从pandas版本0.19开始,pd.get_dummies函数将伪编码的列作为小整数返回.
The float issue is now solved. From pandas version 0.19, pd.get_dummies function returns dummy-encoded columns as small integers.
请参阅: http: //pandas.pydata.org/pandas-docs/stable/whatsnew.html#get-dummies-now-returns-integer-dtypes
这篇关于 pandas get_dummies输出dtype整数/布尔值而不是float的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!