检查数据框列是否为分类 [英] Check if dataframe column is Categorical

查看:98
本文介绍了检查数据框列是否为分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我似乎无法通过v0.15 +中的Pandas改进的Categoricals进行简单的dtype检查.基本上我只想要类似is_categorical(column) -> True/False的东西.

I can't seem to get a simple dtype check working with Pandas' improved Categoricals in v0.15+. Basically I just want something like is_categorical(column) -> True/False.

import pandas as pd
import numpy as np
import random

df = pd.DataFrame({
    'x': np.linspace(0, 50, 6),
    'y': np.linspace(0, 20, 6),
    'cat_column': random.sample('abcdef', 6)
})
df['cat_column'] = pd.Categorical(df2['cat_column'])

我们可以看到分类列的dtype是'category':

We can see that the dtype for the categorical column is 'category':

df.cat_column.dtype
Out[20]: category

通常我们可以通过比较名称来进行dtype检查 的dtype:

And normally we can do a dtype check by just comparing to the name of the dtype:

df.x.dtype == 'float64'
Out[21]: True

但是当尝试检查x列时,这似乎不起作用 是绝对的:

But this doesn't seem to work when trying to check if the x column is categorical:

df.x.dtype == 'category'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-22-94d2608815c4> in <module>()
----> 1 df.x.dtype == 'category'

TypeError: data type "category" not understood

在熊猫v0.15 +中,有什么方法可以进行这些类型的检查吗?

Is there any way to do these types of checks in pandas v0.15+?

推荐答案

使用name属性进行比较,它应该始终有效,因为它只是一个字符串:

Use the name property to do the comparison instead, it should always work because it's just a string:

>>> import numpy as np
>>> arr = np.array([1, 2, 3, 4])
>>> arr.dtype.name
'int64'

>>> import pandas as pd
>>> cat = pd.Categorical(['a', 'b', 'c'])
>>> cat.dtype.name
'category'

因此,总而言之,您可以得到一个简单,直接的函数:

So, to sum up, you can end up with a simple, straightforward function:

def is_categorical(array_like):
    return array_like.dtype.name == 'category'

这篇关于检查数据框列是否为分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆