在 pandas 0.10.1上使用pandas.read_csv指定dtype float32 [英] Specifying dtype float32 with pandas.read_csv on pandas 0.10.1
问题描述
我正在尝试使用pandas read_csv
方法读取一个简单的以空格分隔的文件.但是,熊猫似乎并没有服从我的dtype
参数.也许我没有正确指定它?
I'm attempting to read a simple space-separated file with pandas read_csv
method. However, pandas doesn't seem to be obeying my dtype
argument. Maybe I'm incorrectly specifying it?
我已经将对read_csv
的一些复杂调用简化为这个简单的测试用例.我实际上在真实"场景中使用了converters
参数,但为简单起见我将其删除.
I've distilled down my somewhat complicated call to read_csv
to this simple test case. I'm actually using the converters
argument in my 'real' scenario but I removed this for simplicity.
以下是我的ipython会话:
Below is my ipython session:
>>> cat test.out
a b
0.76398 0.81394
0.32136 0.91063
>>> import pandas
>>> import numpy
>>> x = pandas.read_csv('test.out', dtype={'a': numpy.float32}, delim_whitespace=True)
>>> x
a b
0 0.76398 0.81394
1 0.32136 0.91063
>>> x.a.dtype
dtype('float64')
我也尝试过使用numpy.int32
或numpy.int64
的dtype
进行此操作.这些选择会导致异常:
I've also tried this using this with a dtype
of numpy.int32
or numpy.int64
. These choices result in an exception:
AttributeError: 'NoneType' object has no attribute 'dtype'
我假设AttributeError
是因为熊猫不会自动尝试将float值转换/截断为整数?
I'm assuming the AttributeError
is because pandas will not automatically try to convert/truncate the float values into an integer?
我正在使用32位版本的Python的32位计算机上运行.
I'm running on a 32-bit machine with a 32-bit version of Python.
>>> !uname -a
Linux ubuntu 3.0.0-13-generic #22-Ubuntu SMP Wed Nov 2 13:25:36 UTC 2011 i686 i686 i386 GNU/Linux
>>> import platform
>>> platform.architecture()
('32bit', 'ELF')
>>> pandas.__version__
'0.10.1'
推荐答案
0.10.1并不是真的非常支持float32
0.10.1 doesn't really support float32 very much
请参见 http://pandas.pydata.org /pandas-docs/dev/whatsnew.html#dtype-specification
您可以在0.11中执行以下操作:
you can do this in 0.11 like this:
# dont' use dtype converters explicity for the columns you care about
# they will be converted to float64 if possible, or object if they cannot
df = pd.read_csv('test.csv'.....)
#### this is optional and related to the issue you posted ####
# force anything that is not a numeric to nan
# columns are the list of columns that you are interesetd in
df[columns] = df[columns].convert_objects(convert_numeric=True)
# astype
df[columns] = df[columns].astype('float32')
see http://pandas.pydata.org/pandas-docs/dev/basics.html#object-conversion
Its not as efficient as doing it directly in read_csv (but that requires
some low-level changes)
我已经确认使用0.11开发版可以做到这一点(在32位和64位上,结果相同)
I have confirmed that with 0.11-dev, this DOES work (on 32-bit and 64-bit, results are the same)
In [5]: x = pd.read_csv(StringIO.StringIO(data), dtype={'a': np.float32}, delim_whitespace=True)
In [6]: x
Out[6]:
a b
0 0.76398 0.81394
1 0.32136 0.91063
In [7]: x.dtypes
Out[7]:
a float32
b float64
dtype: object
In [8]: pd.__version__
Out[8]: '0.11.0.dev-385ff82'
In [9]: quit()
vagrant@precise32:~/pandas$ uname -a
Linux precise32 3.2.0-23-generic-pae #36-Ubuntu SMP Tue Apr 10 22:19:09 UTC 2012 i686 i686 i386 GNU/Linux
这篇关于在 pandas 0.10.1上使用pandas.read_csv指定dtype float32的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!