如何使pandas.read_csv不执行任何转换? [英] How to get pandas.read_csv not to perform any conversions?
问题描述
例如,'/tmp/test.csv'中的值(即01
,02
,03
)用于表示恰好与
For example, the values in '/tmp/test.csv' (namely, 01
, 02
, 03
) are meant to represent strings that happen to match /^\d+$/
, as opposed to integers:
In [10]: print open('/tmp/test.csv').read()
A,B,C
01,02,03
默认情况下,pandas.read_csv
将这些值转换为整数:
By default, pandas.read_csv
converts these values to integers:
In [11]: import pandas
In [12]: pandas.read_csv('/tmp/test.csv')
Out[12]:
A B C
0 1 2 3
我想告诉pandas.read_csv
保留所有这些值.即,不执行任何转换.此外,我希望这种请不执行任何操作"指令全面应用,而不必指定任何列名或数字.
I want to tell pandas.read_csv
to leave all these values alone. I.e., perform no conversions whatsoever. Furthermore, I want this "please do nothing" directive to be applied across-the-board, without my having to specify any column names or numbers.
我尝试了一下,但没有成功:
I tried this, which achieved nothing:
In [13]: import csv
In [14]: pandas.read_csv('/tmp/test.csv', quoting=csv.QUOTE_ALL)
Out[14]:
A B C
0 1 2 3
唯一有效的方法是定义一个大型ol'ConstantDict
类,并使用该类的一个实例,该实例始终返回标识函数(lambda x: x
)作为converters
参数的值,从而进行欺骗pandas.read_csv
什么都不做:
The only thing that worked was to define a big ol' ConstantDict
class, and use an instance of it that always returns the identity function (lambda x: x
) as the value for the converters
parameter, and thereby trick pandas.read_csv
into doing nothing:
In [15]: %cpaste
class ConstantDict(dict):
def __init__(self, value):
self.__value = value
def get(self, *args):
return self.__value
--
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
::::::
In [16]: pandas.read_csv('/tmp/test.csv', converters=ConstantDict(lambda x: x))
Out[16]:
A B C
0 01 02 03
要获得如此简单的请不做任何事情"的请求,这是很多体操运动. (如果我要制作ConstantDict
防弹弹,那将是更多的体操运动.)
That's a lot of gymnastics to get such a simple "please do nothing" request across. (It would be even more gymnastics if I were to make ConstantDict
bullet-proof.)
难道没有更简单的方法来实现这一目标吗?
Isn't there a simpler way to achieve this?
推荐答案
df = pd.read_csv('temp.csv', dtype=str)
从文档:
dtype : Type name or dict of column -> type, default None
Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32} (Unsupported with engine=’python’). Use str or object to preserve and not interpret dtype.
这篇关于如何使pandas.read_csv不执行任何转换?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!