如何使pandas.read_csv不执行任何转换? [英] How to get pandas.read_csv not to perform any conversions?

查看:142
本文介绍了如何使pandas.read_csv不执行任何转换?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如,'/tmp/test.csv'中的值(即010203)用于表示恰好与字符串 >,而不是整数:

For example, the values in '/tmp/test.csv' (namely, 01, 02, 03) are meant to represent strings that happen to match /^\d+$/, as opposed to integers:

In [10]: print open('/tmp/test.csv').read()
A,B,C
01,02,03

默认情况下,pandas.read_csv将这些值转换为整数:

By default, pandas.read_csv converts these values to integers:

In [11]: import pandas

In [12]: pandas.read_csv('/tmp/test.csv')
Out[12]: 
   A  B  C
0  1  2  3

我想告诉pandas.read_csv保留所有这些值.即,不执行任何转换.此外,我希望这种请不执行任何操作"指令全面应用,而不必指定任何列名或数字.

I want to tell pandas.read_csv to leave all these values alone. I.e., perform no conversions whatsoever. Furthermore, I want this "please do nothing" directive to be applied across-the-board, without my having to specify any column names or numbers.

我尝试了一下,但没有成功:

I tried this, which achieved nothing:

In [13]: import csv

In [14]: pandas.read_csv('/tmp/test.csv', quoting=csv.QUOTE_ALL)
Out[14]: 
   A  B  C
0  1  2  3

唯一有效的方法是定义一个大型ol'ConstantDict类,并使用该类的一个实例,该实例始终返回标识函数(lambda x: x)作为converters参数的值,从而进行欺骗pandas.read_csv什么都不做:

The only thing that worked was to define a big ol' ConstantDict class, and use an instance of it that always returns the identity function (lambda x: x) as the value for the converters parameter, and thereby trick pandas.read_csv into doing nothing:

In [15]: %cpaste
class ConstantDict(dict):
    def __init__(self, value):
        self.__value = value
    def get(self, *args):
        return self.__value
--
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
::::::
In [16]: pandas.read_csv('/tmp/test.csv', converters=ConstantDict(lambda x: x))
Out[16]: 
    A   B   C
0  01  02  03

要获得如此简单的请不做任何事情"的请求,这是很多体操运动. (如果我要制作ConstantDict防弹弹,那将是更多的体操运动.)

That's a lot of gymnastics to get such a simple "please do nothing" request across. (It would be even more gymnastics if I were to make ConstantDict bullet-proof.)

难道没有更简单的方法来实现这一目标吗?

Isn't there a simpler way to achieve this?

推荐答案

df = pd.read_csv('temp.csv', dtype=str)

文档:

dtype : Type name or dict of column -> type, default None
Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32} (Unsupported with engine=’python’). Use str or object to preserve and not interpret dtype.

这篇关于如何使pandas.read_csv不执行任何转换?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆