pandas 将所有对象列强制转换为类别 [英] Pandas cast all object columns to category
问题描述
我想拥有一种优雅的功能来将所有对象列转换为熊猫数据框架分类
df [x] = df [x] .astype("category")
执行类型转换 df.select_dtypes(include = ['object'])
将对所有类别列进行子选择.但是,这会导致其他列的丢失/需要手动合并.是否有一种解决方案可以就地工作"或不需要手动转换?
编辑
我正在寻找与
df.info()< class'pandas.core.frame.DataFrame'>RangeIndex:4个条目(0到3)数据列(共4列):一个4非null int64B 4非空对象C 4非空int64D 4非空对象dtypes:int64(2),对象(2)内存使用率:200.0+字节
让我们使用 select_dtypes
包括所有'object'
类型进行转换,并与 select_dtypes
组合以排除它们.
df = pd.concat([df.select_dtypes([],['object']),df.select_dtypes(['object']).apply(pd.Series.astype,dtype ='category')],axis = 1).reindex_axis(df.columns,axis = 1)df.info()< class'pandas.core.frame.DataFrame'>RangeIndex:4个条目(0到3)数据列(共4列):一个4非null int64B 4非空类别C 4非空int64D 4非空类别dtypes:category(2),int64(2)内存使用量:208.0字节
I want to have ha elegant function to cast all object columns in a pandas data frame to categories
df[x] = df[x].astype("category")
performs the type cast
df.select_dtypes(include=['object'])
would sub-select all categories columns. However this results in a loss of the other columns / a manual merge is required. Is there a solution which "just works in place" or does not require a manual cast?
edit
I am looking for something similar as http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.convert_objects.html for a conversion to categorical data
use apply
and pd.Series.astype
with dtype='category'
Consider the pd.DataFrame
df
df = pd.DataFrame(dict(
A=[1, 2, 3, 4],
B=list('abcd'),
C=[2, 3, 4, 5],
D=list('defg')
))
df
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
A 4 non-null int64
B 4 non-null object
C 4 non-null int64
D 4 non-null object
dtypes: int64(2), object(2)
memory usage: 200.0+ bytes
Lets use select_dtypes
to include all 'object'
types to convert and recombine with a select_dtypes
to exclude them.
df = pd.concat([
df.select_dtypes([], ['object']),
df.select_dtypes(['object']).apply(pd.Series.astype, dtype='category')
], axis=1).reindex_axis(df.columns, axis=1)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
A 4 non-null int64
B 4 non-null category
C 4 non-null int64
D 4 non-null category
dtypes: category(2), int64(2)
memory usage: 208.0 bytes
这篇关于 pandas 将所有对象列强制转换为类别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!