pandas - 与相同类别的列连接变成对象 [英] pandas - concat with columns of same categories turns to object

查看:58
本文介绍了 pandas - 与相同类别的列连接变成对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过首先将缺少的类别添加到每列来连接两个具有类别类型列的数据框.

I want to concatenate two dataframes with category-type columns, by first adding the missing categories to each column.

df = pd.DataFrame({"a": pd.Categorical(["foo", "foo", "bar"]), "b": [1, 2, 1]})
df2 = pd.DataFrame({"a": pd.Categorical(["baz"]), "b": [1]})

df["a"] = df["a"].cat.add_categories("baz")
df2["a"] = df2["a"].cat.add_categories(["foo", "bar"])

理论上,两个 "a" 列的类别是相同的:

In theory the categories for both "a" columns are the same:

In [33]: df.a.cat.categories
Out[33]: Index(['bar', 'foo', 'baz'], dtype='object')

In [34]: df2.a.cat.categories
Out[34]: Index(['baz', 'foo', 'bar'], dtype='object')

但是,当连接两个数据帧时,我得到一个 object-type "a" 列:

However, when concatenating the two dataframes, I get an object-type "a" column:

In [35]: pd.concat([df, df2]).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 0
Data columns (total 2 columns):
a    4 non-null object
b    4 non-null int64
dtypes: int64(1), object(1)
memory usage: 96.0+ bytes

文档 中说,当类别是同样,它应该导致 category 类型的列.即使类别是无序的,类别的顺序是否重要?我正在使用 pandas-0.20.3.

In the documentation it says that when categories are the same, it should result in a category-type column. Does the order of the categories matter even though the category is unordered? I am using pandas-0.20.3.

推荐答案

是的.通过使用 reorder_categories,您可以更改类别的顺序,即使类别本身是无序的.

Yes. By using reorder_categories you can change the order of categories, even though the category itself is unordered.

df2["a"] = df2.a.cat.reorder_categories(df.a.cat.categories)

In [43]: pd.concat([df, df2]).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 0
Data columns (total 2 columns):
a    4 non-null category
b    4 non-null int64
dtypes: category(1), int64(1)
memory usage: 172.0 bytes

这篇关于 pandas - 与相同类别的列连接变成对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆