分配pandas dataframe列dtypes [英] Assign pandas dataframe column dtypes

查看:120
本文介绍了分配pandas dataframe列dtypes的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在pd.Dataframe中设置多列的dtype(我有一个文件,我不得不手动将其解析为列表列表,因为该文件不适合pd.read_csv))

I want to set the dtypes of multiple columns in pd.Dataframe (I have a file that I've had to manually parse into a list of lists, as the file was not amenable for pd.read_csv)

import pandas as pd
print pd.DataFrame([['a','1'],['b','2']],
                   dtype={'x':'object','y':'int'},
                   columns=['x','y'])

我知道

ValueError: entry not a 2- or 3- tuple

我设置它们的唯一方法是循环遍历每个列变量并使用astype重铸.

The only way I can set them is by looping through each column variable and recasting with astype.

dtypes = {'x':'object','y':'int'}
mydata = pd.DataFrame([['a','1'],['b','2']],
                      columns=['x','y'])
for c in mydata.columns:
    mydata[c] = mydata[c].astype(dtypes[c])
print mydata['y'].dtype   #=> int64

有更好的方法吗?

推荐答案

从0.17开始,您必须使用显式转换:

Since 0.17, you have to use the explicit conversions:

pd.to_datetime, pd.to_timedelta and pd.to_numeric

(如下所述,不再是魔术",convert_objects在0.17中已弃用)

(As mentioned below, no more "magic", convert_objects has been deprecated in 0.17)

df = pd.DataFrame({'x': {0: 'a', 1: 'b'}, 'y': {0: '1', 1: '2'}, 'z': {0: '2018-05-01', 1: '2018-05-02'}})

df.dtypes

x    object
y    object
z    object
dtype: object

df

   x  y           z
0  a  1  2018-05-01
1  b  2  2018-05-02

您可以将它们应用于要转换的每一列:

You can apply these to each column you want to convert:

df["y"] = pd.to_numeric(df["y"])
df["z"] = pd.to_datetime(df["z"])    
df

   x  y          z
0  a  1 2018-05-01
1  b  2 2018-05-02

df.dtypes

x            object
y             int64
z    datetime64[ns]
dtype: object

并确认dtype已更新.

and confirm the dtype is updated.

熊猫0.12-0.16的旧版/不推荐使用的答案:您可以使用

OLD/DEPRECATED ANSWER for pandas 0.12 - 0.16: You can use convert_objects to infer better dtypes:

In [21]: df
Out[21]: 
   x  y
0  a  1
1  b  2

In [22]: df.dtypes
Out[22]: 
x    object
y    object
dtype: object

In [23]: df.convert_objects(convert_numeric=True)
Out[23]: 
   x  y
0  a  1
1  b  2

In [24]: df.convert_objects(convert_numeric=True).dtypes
Out[24]: 
x    object
y     int64
dtype: object

魔术!(SAD看到它弃用.)

Magic! (Sad to see it deprecated.)

这篇关于分配pandas dataframe列dtypes的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆