如何在Python 2.4 CSV阅读器中禁用引用? [英] How can I disable quoting in the Python 2.4 CSV reader?

查看:126
本文介绍了如何在Python 2.4 CSV阅读器中禁用引用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个Python实用程序,需要解析一个大的,不定期更新的CSV文件,我不能控制。该实用程序必须在只有Python 2.4可用的服务器上运行。 CSV文件根本不引用字段值,但 Python 2.4版本的csv库似乎没有给我任何方法来关闭引用,它只是允许我设置引号字符( dialect.quotechar =''或其他)如果我尝试将引号字符设置为 None 或空字符串,我得到一个错误。 >

我可以通过将 dialect.quotechar 设置为一些罕见字符来解决这个问题,但这很脆弱,因为没有ASCII字符我可以绝对保证不会出现在字段值(除了分隔符,但如果我设置 dialect.quotechar = dialect.delimiter ,事情可以预见的haywire )。



Python 2.5及更高版本,如果我将 dialect.quoting 设置为 csv.QUOTE_NONE ,则CSV阅读器尊重,将任何字符解释为引用字符。有什么办法在Python 2.4中复制这个行为吗?



UPDATE :感谢Triptych和Mark Roddy帮助缩小问题。这是一个最简单的示例:

 >> import csv 
>>>> import StringIO
>>>> data =
... 1,2,3,4,5
... 1,2,3,4,5
...
>>>>> reader = csv.reader(StringIO.StringIO(data))
>>> for i in reader:print i
...
[ ]
回溯(最近一次调用):
文件< stdin>,第1行,在?
_csv.Error:newline inside string

问题只发生在行的最后一行列中有一个双引号字符时,我接受了Tanj的解决方案:手动分配一个非打印字符(\x07 BEL )作为quotechar。这是hacky,但它的工作,我还没有看到另一个解决方案。这是一个演示的解决方案在行动:

 >>> class MyDialect(csv.Dialect):$ b,import csv 
>>> import StringIO
> $ b ... quotechar ='\x07'
... delimiter =','
... lineterminator ='\\\
'
... doublequote = False
... skipinitialspace = False
... quoting = csv.QUOTE_NONE
... escapechar ='\\'
...
>> > dialect = MyDialect()
>>> data =
... 1,2,3,4,5
... 1,2,3,4,5
...
>>>> reader = csv.reader(StringIO.StringIO(data),dialect = dialect)
>>> for i in reader:print i
...
[]
['1','2','3','4','5']
['1','2','3','4 ','5']



在Python 2.5+中设置引用csv.QUOTE_NONE就足够了,那么 quotechar 的值将是不相关的。 (我实际上通过 csv.Sniffer 得到我的初始方言,然后覆盖quotechar值,而不是子类化 csv.Dialect ,但我不希望这是一个分心的真正的问题;上面两个会话表明 Sniffer 不是问题。)

解决方案

我不知道python是否会允许它,但是你可以使用不可打印的ascii代码,如BEL或BS backspace)这些我认为是非常罕见的。


I am writing a Python utility that needs to parse a large, regularly-updated CSV file I don't control. The utility must run on a server with only Python 2.4 available. The CSV file does not quote field values at all, but the Python 2.4 version of the csv library does not seem to give me any way to turn off quoting, it just allows me to set the quote character (dialect.quotechar = '"' or whatever). If I try setting the quote character to None or the empty string, I get an error.

I can sort of work around this by setting dialect.quotechar to some "rare" character, but this is brittle, as there is no ASCII character I can absolutely guarantee will not show up in field values (except the delimiter, but if I set dialect.quotechar = dialect.delimiter, things go predictably haywire).

In Python 2.5 and later, if I set dialect.quoting to csv.QUOTE_NONE, the CSV reader respects that and does not interpret any character as a quote character. Is there any way to duplicate this behavior in Python 2.4?

UPDATE: Thanks Triptych and Mark Roddy for helping to narrow the problem down. Here's a simplest-case demonstration:

>>> import csv
>>> import StringIO
>>> data = """
... 1,2,3,4,"5
... 1,2,3,4,5
... """
>>> reader = csv.reader(StringIO.StringIO(data))
>>> for i in reader: print i
... 
[]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
_csv.Error: newline inside string

The problem only occurs when there's a single double-quote character in the final column of a row. Unfortunately, this situation exists in my dataset. I've accepted Tanj's solution: manually assign a nonprinting character ("\x07" or BEL) as the quotechar. This is hacky, but it works, and I haven't yet seen another solution that does. Here's a demo of the solution in action:

>>> import csv
>>> import StringIO
>>> class MyDialect(csv.Dialect):
...     quotechar = '\x07'
...     delimiter = ','
...     lineterminator = '\n'
...     doublequote = False
...     skipinitialspace = False
...     quoting = csv.QUOTE_NONE
...     escapechar = '\\'
... 
>>> dialect = MyDialect()
>>> data = """
... 1,2,3,4,"5
... 1,2,3,4,5
... """
>>> reader = csv.reader(StringIO.StringIO(data), dialect=dialect)
>>> for i in reader: print i
... 
[]
['1', '2', '3', '4', '"5']
['1', '2', '3', '4', '5']

In Python 2.5+ setting quoting to csv.QUOTE_NONE would be sufficient, and the value of quotechar would then be irrelevant. (I'm actually getting my initial dialect via a csv.Sniffer and then overriding the quotechar value, not by subclassing csv.Dialect, but I don't want that to be a distraction from the real issue; the above two sessions demonstrate that Sniffer isn't the problem.)

解决方案

I don't know if python would like/allow it but could you use a non-printable ascii code such as BEL or BS (backspace) These I would think to be extremely rare.

这篇关于如何在Python 2.4 CSV阅读器中禁用引用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆