pandas get_dummies输出dtype整数/布尔值而不是float [英] Pandas get_dummies to output dtype integer/bool instead of float

查看:50
本文介绍了 pandas get_dummies输出dtype整数/布尔值而不是float的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否可以要求pandas中的get_dummies函数以比默认float64更轻的dtype输出假人数据帧.

I would like to know if could ask the get_dummies function in pandas to output the dummies dataframe with a dtype lighter than the default float64.

因此,对于具有分类列的示例数据框:

So, for a sample dataframe with categorical columns:

In []: df = pd.DataFrame([(blue,wood),(blue,metal),(red,wood)],
                         columns=['C1','C2'])
In []: df
Out[]:
    C1      C2
0   blue    wood
1   blue    metal
2   red     wood

得到假人后,它看起来像:

after getting the dummies, it looks like:

In []: df = pd.get_dummies(df)
In []: df    
Out[]:
 C1_blue    C1_red  C2_metal    C2_wood
0   1   0   0   1
1   1   0   1   0
2   0   1   0   1

这很好.但是,默认情况下,1和0为float64:

which is perfectly fine. However, by default the 1's and 0's are float64:

In []: df.dtypes
Out[]: 
C1_blue     float64
C1_red      float64
C2_metal    float64
C2_wood     float64
dtype: object

我知道以后可以使用astype更改dtype:

I know I can change the dtype afterwards with astype:

In []: df = pd.get_dummies(df).astype(np.int8)

但是我不想在内存中有浮点数的数据框,因为我正在处理一个大的数据框(来自大约5Gb的csv).我想直接将假人作为整数.

But I don't want to have the dataframe with floats in memory, because I am dealing with a big dataframe (from a csv of about ~5Gb). I would like to have the dummies directly as integers.

推荐答案

浮动问题现已解决.从pandas版本0.19开始,pd.get_dummies函数将伪编码的列作为小整数返回.

The float issue is now solved. From pandas version 0.19, pd.get_dummies function returns dummy-encoded columns as small integers.

请参阅: http: //pandas.pydata.org/pandas-docs/stable/whatsnew.html#get-dummies-now-returns-integer-dtypes

这篇关于 pandas get_dummies输出dtype整数/布尔值而不是float的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆