pandas 将所有对象列强制转换为类别 [英] Pandas cast all object columns to category

查看:57
本文介绍了 pandas 将所有对象列强制转换为类别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想拥有一种优雅的功能来将所有对象列转换为熊猫数据框架分类

df [x] = df [x] .astype("category")执行类型转换 df.select_dtypes(include = ['object'])将对所有类别列进行子选择.但是,这会导致其他列的丢失/需要手动合并.是否有一种解决方案可以就地工作"或不需要手动转换?

编辑

我正在寻找与

  df.info()< class'pandas.core.frame.DataFrame'>RangeIndex:4个条目(0到3)数据列(共4列):一个4非null int64B 4非空对象C 4非空int64D 4非空对象dtypes:int64(2),对象(2)内存使用率:200.0+字节 

让我们使用 select_dtypes 包括所有'object'类型进行转换,并与 select_dtypes 组合以排除它们.

  df = pd.concat([df.select_dtypes([],['object']),df.select_dtypes(['object']).apply(pd.Series.astype,dtype ='category')],axis = 1).reindex_axis(df.columns,axis = 1)df.info()< class'pandas.core.frame.DataFrame'>RangeIndex:4个条目(0到3)数据列(共4列):一个4非null int64B 4非空类别C 4非空int64D 4非空类别dtypes:category(2),int64(2)内存使用量:208.0字节 

I want to have ha elegant function to cast all object columns in a pandas data frame to categories

df[x] = df[x].astype("category") performs the type cast df.select_dtypes(include=['object']) would sub-select all categories columns. However this results in a loss of the other columns / a manual merge is required. Is there a solution which "just works in place" or does not require a manual cast?

edit

I am looking for something similar as http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.convert_objects.html for a conversion to categorical data

解决方案

use apply and pd.Series.astype with dtype='category'

Consider the pd.DataFrame df

df = pd.DataFrame(dict(
        A=[1, 2, 3, 4],
        B=list('abcd'),
        C=[2, 3, 4, 5],
        D=list('defg')
    ))
df

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
A    4 non-null int64
B    4 non-null object
C    4 non-null int64
D    4 non-null object
dtypes: int64(2), object(2)
memory usage: 200.0+ bytes

Lets use select_dtypes to include all 'object' types to convert and recombine with a select_dtypes to exclude them.

df = pd.concat([
        df.select_dtypes([], ['object']),
        df.select_dtypes(['object']).apply(pd.Series.astype, dtype='category')
        ], axis=1).reindex_axis(df.columns, axis=1)

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
A    4 non-null int64
B    4 non-null category
C    4 non-null int64
D    4 non-null category
dtypes: category(2), int64(2)
memory usage: 208.0 bytes

这篇关于 pandas 将所有对象列强制转换为类别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆