如何在Pandas DataFrame中按列设置dtypes [英] How to set dtypes by column in pandas DataFrame
问题描述
我想将一些数据带入pandas DataFrame中,并且想为导入时的每一列分配dtypes.我希望能够对具有许多不同列的较大数据集执行此操作,但是,例如:
I want to bring some data into a pandas DataFrame and I want to assign dtypes for each column on import. I want to be able to do this for larger datasets with many different columns, but, as an example:
myarray = np.random.randint(0,5,size=(2,2))
mydf = pd.DataFrame(myarray,columns=['a','b'], dtype=[float,int])
mydf.dtypes
导致:
TypeError:无法理解数据类型
TypeError: data type not understood
我尝试了其他一些方法,例如:
I tried a few other methods such as:
mydf = pd.DataFrame(myarray,columns=['a','b'], dtype={'a': int})
TypeError:类型为'type'的对象没有len()
TypeError: object of type 'type' has no len()
如果我放dtype=(float,int)
,它将对两列都采用浮点格式.
If I put dtype=(float,int)
it applies a float format to both columns.
最后,我只希望能够向其传递数据类型列表,就像我向其传递列名称列表一样.
In the end I would like to just be able to pass it a list of datatypes the same way I can pass it a list of column names.
推荐答案
从pandas版本0.24.2(当前的稳定版本)开始,无法将明确的数据类型列表传递给docs状态的DataFrame构造函数:
As of pandas version 0.24.2 (the current stable release) it is not possible to pass an explicit list of datatypes to the DataFrame constructor as the docs state:
dtype : dtype, default None
Data type to force. Only a single dtype is allowed. If None, infer
但是,dataframe类确实具有静态方法,使您可以将numpy结构化数组转换为数据框,从而可以执行以下操作:
However, the dataframe class does have a static method allowing you to convert a numpy structured array to a dataframe so you can do:
>>> myarray = np.random.randint(0,5,size=(2,2))
>>> record = np.array(map(tuple,myarray),dtype=[('a',np.float),('b',np.int)])
>>> mydf = pd.DataFrame.from_records(record)
>>> mydf.dtypes
a float64
b int64
dtype: object
这篇关于如何在Pandas DataFrame中按列设置dtypes的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!