pandas :为什么数字浮点数的默认列类型是? [英] Pandas: Why is default column type for numeric float?

查看:97
本文介绍了 pandas :为什么数字浮点数的默认列类型是?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将Pandas 0.18.1与python 2.7.x一起使用.我有一个空的数据框,我首先阅读.我看到这些列的类型是object,这是可以的.当我分配一行数据时,数值的类型更改为float64.我期待的是intint64.为什么会这样?

I am using Pandas 0.18.1 with python 2.7.x. I have an empty dataframe that I read first. I see that the types of these columns are object which is OK. When I assign one row of data, the type for numeric values changes to float64. I was expecting int or int64. Why does this happen?

是否有一种方法可以设置一些全局选项,以使Pandas知道对于数值,默认情况下将其视为int,除非数据中包含.?例如,[0 1.0, 2.],第一列是int,其他两列是float64?

Is there a way to set some global option to let Pandas knows that for numeric values, treat them by default as int unless the data has a .? For example, [0 1.0, 2.], first column is int but other two are float64?

例如:

>>> df = pd.read_csv('foo.csv', engine='python', keep_default_na=False)
>>> print df.dtypes
bbox_id_seqno    object
type             object
layer            object
ll_x             object
ll_y             object
ur_x             object
ur_y             object
polygon_count    object
dtype: object
>>> df.loc[0] = ['a', 'b', 'c', 1, 2, 3, 4, 5]
>>> print df.dtypes
bbox_id_seqno     object
type              object
layer             object
ll_x             float64
ll_y             float64
ur_x             float64
ur_y             float64
polygon_count    float64
dtype: object

推荐答案

熊猫不可能将NaN值存储在整数列中.

It's not possible for Pandas to store NaN values in integer columns.

这使得float显然是数据存储的默认选择,因为一旦丢失值出现,Pandas就必须更改整个列的数据类型.在实践中,经常会出现缺失值.

This makes float the obvious default choice for data storage, because as soon as missing value arises Pandas would have to change the data type for the entire column. And missing values arise very often in practice.

关于为什么,这是从Numpy继承的限制.基本上,熊猫需要预留一个特定的位模式来表示NaN.这对于浮点数很简单,它是在IEEE 754标准中定义的.对于固定宽度的整数,这样做比较麻烦且效率较低.

As for why this is, it's a restriction inherited from Numpy. Basically, Pandas needs to set aside a particular bit pattern to represent NaN. This is straightforward for floating point numbers and it's defined in the IEEE 754 standard. It's more awkward and less efficient to do this for a fixed-width integer.

更新

Update

激动人心的大熊猫新闻0.24. IntegerArray是一项实验性功能,但可能会使我的原始答案过时了.因此,如果您在2019年2月27日当天或之后阅读本文,请查看

Exciting news in pandas 0.24. IntegerArray is an experimental feature but might render my original answer obsolete. So if you're reading this on or after 27 Feb 2019, check out the docs for that feature.

这篇关于 pandas :为什么数字浮点数的默认列类型是?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆