为什么 pandas 将大于2 ** 63-1的unsigned int转换为对象? [英] Why does pandas convert unsigned int greater than 2**63-1 to objects?

查看:108
本文介绍了为什么 pandas 将大于2 ** 63-1的unsigned int转换为对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我将numpy数组转换为pandas数据帧时,如果整数大于2 ^ 63-1,pandas会将uint64类型更改为对象类型.

When I convert a numpy array to a pandas data frame pandas changes uint64 types to object types if the integer is greater than 2^63 - 1.

import pandas as pd
import numpy as np

x = np.array([('foo', 2 ** 63)], dtype = np.dtype([('string', np.str_, 3), ('unsigned', np.uint64)]))
y = np.array([('foo', 2 ** 63 - 1)], dtype = np.dtype([('string', np.str_, 3), ('unsigned', np.uint64)]))

print pd.DataFrame(x).dtypes.unsigned
dtype('O')
print pd.DataFrame(y).dtypes.unsigned
dtype('uint64')

这很烦人,因为我无法以表格格式将数据帧写入hdf文件:

This is annoying as I can't write the data frame to a hdf file in the table format:

pd.DataFrame(x).to_hdf('x.hdf', 'key', format = 'table')

输出:

TypeError:无法序列化列[unsigned],因为 它的数据内容是[整数]对象dtype

TypeError: Cannot serialize the column [unsigned] because its data contents are [integer] object dtype

有人可以解释类型转换吗?

Can someone explain the type conversion?

推荐答案

这是一个打开错误,但您可以将其强制返回到uint64

It's an open bug, but you can force it back to an uint64 using DataFrame.astype()

x = np.array([('foo', 2 ** 63)], dtype = np.dtype([('string', np.str_, 3), ('unsigned', np.uint64)]))

a = pd.DataFrame(x)
a['unsigned'] = a['unsigned'].astype(np.uint64)
>>>a.dtypes
string      object
unsigned    uint64
dtype: object

用于将数据类型转换为数值的其他方法引发错误或不起作用:

Other methods used to convert data types to numeric values raised errors or did not work:

>>>pd.to_numeric(a['unsigned'], errors = coerce)
OverflowError: Python int too large to convert to C long

>>>a.convert_objects(convert_numeric = True).dtypes
string      object
unsigned    object
dtype: object

这篇关于为什么 pandas 将大于2 ** 63-1的unsigned int转换为对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆