如何在Pandas DataFrame中按列设置dtypes [英] How to set dtypes by column in pandas DataFrame

查看:684
本文介绍了如何在Pandas DataFrame中按列设置dtypes的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将一些数据带入pandas DataFrame中,并且想为导入时的每一列分配dtypes.我希望能够对具有许多不同列的较大数据集执行此操作,但是,例如:

I want to bring some data into a pandas DataFrame and I want to assign dtypes for each column on import. I want to be able to do this for larger datasets with many different columns, but, as an example:

myarray = np.random.randint(0,5,size=(2,2))
mydf = pd.DataFrame(myarray,columns=['a','b'], dtype=[float,int])
mydf.dtypes

导致:

TypeError:无法理解数据类型

TypeError: data type not understood

我尝试了其他一些方法,例如:

I tried a few other methods such as:

mydf = pd.DataFrame(myarray,columns=['a','b'], dtype={'a': int})

TypeError:类型为'type'的对象没有len()

TypeError: object of type 'type' has no len()

如果我放dtype=(float,int),它将对两列都采用浮点格式.

If I put dtype=(float,int) it applies a float format to both columns.

最后,我只希望能够向其传递数据类型列表,就像我向其传递列名称列表一样.

In the end I would like to just be able to pass it a list of datatypes the same way I can pass it a list of column names.

推荐答案

从pandas版本0.24.2(当前的稳定版本)开始,无法将明确的数据类型列表传递给docs状态的DataFrame构造函数:

As of pandas version 0.24.2 (the current stable release) it is not possible to pass an explicit list of datatypes to the DataFrame constructor as the docs state:

dtype : dtype, default None

    Data type to force. Only a single dtype is allowed. If None, infer

但是,dataframe类确实具有静态方法,使您可以将numpy结构化数组转换为数据框,从而可以执行以下操作:

However, the dataframe class does have a static method allowing you to convert a numpy structured array to a dataframe so you can do:

>>> myarray = np.random.randint(0,5,size=(2,2))
>>> record = np.array(map(tuple,myarray),dtype=[('a',np.float),('b',np.int)])
>>> mydf = pd.DataFrame.from_records(record)
>>> mydf.dtypes
a    float64
b      int64
dtype: object

这篇关于如何在Pandas DataFrame中按列设置dtypes的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆