将CSV文件读取到"DataFrame"时,如何指定索引的"dtype"? [英] How to specify the `dtype` of index when read a csv file to `DataFrame`?
本文介绍了将CSV文件读取到"DataFrame"时,如何指定索引的"dtype"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
在python 3.4.3和Pandas 0.16中,如何将索引的dtype
指定为str
?
以下代码是我尝试过的:
In python 3.4.3 and Pandas 0.16, how to specify the dtype
of index as str
?
The following code is what I have tried:
In [1]: from io import StringIO
In [2]: import pandas as pd
In [3]: import numpy as np
In [4]: fra = pd.read_csv(StringIO('date,close\n20140101,10.2\n20140102,10.5'), index_col=0, dtype={'date': np.str_, 'close': np.float})
In [5]: fra.index
Out[5]: Int64Index([20140101, 20140102], dtype='int64')
推荐答案
参数index_col=0
似乎优先于dtype
参数,如果您放下index_col
参数,则可以调用set_index
之后:
It looks like the param index_col=0
is taking precedence over the dtype
param, if you drop the index_col
param then you can call set_index
after:
In [235]:
fra = pd.read_csv(io.StringIO('date,close\n20140101,10.2\n20140102,10.5'), dtype={'date': np.str_, 'close': np.float})
fra
Out[235]:
date close
0 20140101 10.2
1 20140102 10.5
In [236]:
fra = fra.set_index('date')
fra.index
Out[236]:
Index(['20140101', '20140102'], dtype='object')
另一种方法是放下index_col
参数,然后在read_csv
返回的df上调用set_index
,这样它就变成了单行代码:
An alternative is to drop the index_col
param and just call set_index
on the df returned from read_csv
so it becomes a one-liner:
In [237]:
fra = pd.read_csv(io.StringIO('date,close\n20140101,10.2\n20140102,10.5'), dtype={'date': np.str_, 'close': np.float}).set_index('date')
fra.index
Out[237]:
Index(['20140101', '20140102'], dtype='object')
更新
这是一个错误,其目标版本为0.17.0
This is a bug which is targeted for version 0.17.0
这篇关于将CSV文件读取到"DataFrame"时,如何指定索引的"dtype"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文