在Pandas中将文本转换为int64类别 [英] Convert text to int64 categorical in Pandas
本文介绍了在Pandas中将文本转换为int64类别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在data['artist']
中有一些歌手的名字,我想通过以下方式将其转换为分类列:
I have some artist names in data['artist']
that I would like to convert to a categorical column via:
x = data['artist'].astype('category').cat.codes
x.dtype
返回:
dtype('int32')
我得到的负数表明存在某种溢出情况.因此,我想改用np.int64
,但找不到有关如何完成此操作的文档.
I am getting negative numbers which suggests some sort of overflow situation. So, I'd like to use np.int64
instead but I can't find documentation on how to accomplish this.
x = data['artist'].astype('category').cat.codes.astype(np.int64)
x.dtype
给予
dtype('int64')
但是很明显,int32被转换为int64,因此负值仍然存在
but it is clear that the int32 gets converted to int64 and so the negative value is still present
x = data['artist'].astype('category').cat.codes.astype(np.int64)
x.min()
-1
推荐答案
我认为您在artist
列中有NaN
,因此代码为-1
:
I think you have NaN
in column artist
, so code is -1
:
data=pd.DataFrame({'artist':[np.nan,'y','z','x','y','z']})
x = data['artist'].astype('category').cat.codes
print x
0 -1
1 1
2 2
3 0
4 1
5 2
dtype: int8
print data[data.artist.isnull()]
artist
0 NaN
这篇关于在Pandas中将文本转换为int64类别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文