检查数据框列是否为分类 [英] Check if dataframe column is Categorical
问题描述
我似乎无法通过v0.15 +中的Pandas改进的Categoricals进行简单的dtype检查.基本上我只想要类似is_categorical(column) -> True/False
的东西.
I can't seem to get a simple dtype check working with Pandas' improved Categoricals in v0.15+. Basically I just want something like is_categorical(column) -> True/False
.
import pandas as pd
import numpy as np
import random
df = pd.DataFrame({
'x': np.linspace(0, 50, 6),
'y': np.linspace(0, 20, 6),
'cat_column': random.sample('abcdef', 6)
})
df['cat_column'] = pd.Categorical(df2['cat_column'])
我们可以看到分类列的dtype
是'category':
We can see that the dtype
for the categorical column is 'category':
df.cat_column.dtype
Out[20]: category
通常我们可以通过比较名称来进行dtype检查 的dtype:
And normally we can do a dtype check by just comparing to the name of the dtype:
df.x.dtype == 'float64'
Out[21]: True
但是当尝试检查x
列时,这似乎不起作用
是绝对的:
But this doesn't seem to work when trying to check if the x
column
is categorical:
df.x.dtype == 'category'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-22-94d2608815c4> in <module>()
----> 1 df.x.dtype == 'category'
TypeError: data type "category" not understood
在熊猫v0.15 +中,有什么方法可以进行这些类型的检查吗?
Is there any way to do these types of checks in pandas v0.15+?
推荐答案
使用name
属性进行比较,它应该始终有效,因为它只是一个字符串:
Use the name
property to do the comparison instead, it should always work because it's just a string:
>>> import numpy as np
>>> arr = np.array([1, 2, 3, 4])
>>> arr.dtype.name
'int64'
>>> import pandas as pd
>>> cat = pd.Categorical(['a', 'b', 'c'])
>>> cat.dtype.name
'category'
因此,总而言之,您可以得到一个简单,直接的函数:
So, to sum up, you can end up with a simple, straightforward function:
def is_categorical(array_like):
return array_like.dtype.name == 'category'
这篇关于检查数据框列是否为分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!