将CSV文件读取到"DataFrame"时,如何指定索引的"dtype"? [英] How to specify the `dtype` of index when read a csv file to `DataFrame`?

查看:462
本文介绍了将CSV文件读取到"DataFrame"时,如何指定索引的"dtype"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在python 3.4.3和Pandas 0.16中,如何将索引的dtype指定为str? 以下代码是我尝试过的:

In python 3.4.3 and Pandas 0.16, how to specify the dtype of index as str? The following code is what I have tried:

In [1]: from io import StringIO

In [2]: import pandas as pd

In [3]: import numpy as np

In [4]: fra = pd.read_csv(StringIO('date,close\n20140101,10.2\n20140102,10.5'), index_col=0, dtype={'date': np.str_, 'close': np.float})

In [5]: fra.index
Out[5]: Int64Index([20140101, 20140102], dtype='int64')

推荐答案

参数index_col=0似乎优先于dtype参数,如果您放下index_col参数,则可以调用set_index之后:

It looks like the param index_col=0 is taking precedence over the dtype param, if you drop the index_col param then you can call set_index after:

In [235]:

fra = pd.read_csv(io.StringIO('date,close\n20140101,10.2\n20140102,10.5'), dtype={'date': np.str_, 'close': np.float})
fra
Out[235]:
       date  close
0  20140101   10.2
1  20140102   10.5
In [236]:

fra = fra.set_index('date')
fra.index
Out[236]:
Index(['20140101', '20140102'], dtype='object')

另一种方法是放下index_col参数,然后在read_csv返回的df上调用set_index,这样它就变成了单行代码:

An alternative is to drop the index_col param and just call set_index on the df returned from read_csv so it becomes a one-liner:

In [237]:

fra = pd.read_csv(io.StringIO('date,close\n20140101,10.2\n20140102,10.5'), dtype={'date': np.str_, 'close': np.float}).set_index('date')
fra.index
Out[237]:
Index(['20140101', '20140102'], dtype='object')

更新

这是一个错误,其目标版本为0.17.0

This is a bug which is targeted for version 0.17.0

这篇关于将CSV文件读取到"DataFrame"时,如何指定索引的"dtype"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆