在Pandas中将文本转换为int64类别 [英] Convert text to int64 categorical in Pandas

查看:663
本文介绍了在Pandas中将文本转换为int64类别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在data['artist']中有一些歌手的名字,我想通过以下方式将其转换为分类列:

I have some artist names in data['artist'] that I would like to convert to a categorical column via:

x = data['artist'].astype('category').cat.codes
x.dtype 

返回:

dtype('int32')

我得到的负数表明存在某种溢出情况.因此,我想改用np.int64,但找不到有关如何完成此操作的文档.

I am getting negative numbers which suggests some sort of overflow situation. So, I'd like to use np.int64 instead but I can't find documentation on how to accomplish this.

x = data['artist'].astype('category').cat.codes.astype(np.int64)
x.dtype

给予

dtype('int64')

但是很明显,int32被转换为int64,因此负值仍然存在

but it is clear that the int32 gets converted to int64 and so the negative value is still present

x = data['artist'].astype('category').cat.codes.astype(np.int64)
x.min()

-1

推荐答案

我认为您在artist列中有NaN,因此代码为-1:

I think you have NaN in column artist, so code is -1:

data=pd.DataFrame({'artist':[np.nan,'y','z','x','y','z']})

x = data['artist'].astype('category').cat.codes
print x
0   -1
1    1
2    2
3    0
4    1
5    2
dtype: int8

要检查NaN,可以使用 :

print data[data.artist.isnull()]
  artist
0    NaN

这篇关于在Pandas中将文本转换为int64类别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆