使用StringIO和csv模块的通用换行模式的意外行为 [英] Unexpected behavior of universal newline mode with StringIO and csv modules

查看:311
本文介绍了使用StringIO和csv模块的通用换行模式的意外行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请考虑以下(Windows下的Python 3.2):

 > import io 
>>>> import csv
>>>> output = io.StringIO()#default parameter newline = None
>>>> csvdata = [1,'a','Whoa!\\\
Newlines!']
>>> writer = csv.writer(output,quoting = csv.QUOTE_NONNUMERIC)
>>> writer.writerow(csvdata)
25
>>> output.getvalue()
'1,a,Whoa!\\\
Newlines!\r\\\
'

为什么有一个 \\\
- 应该不是转换为 \r\\\

上启用 n \r \r\\\

在被返回给调用者之前被转换为 \\\

相反,在输出 \\\
被转换为系统默认行
separator, os.linesep



解决方案

code> \\\
作为第三个字段内的数据字符。因此,该字段被引用,使得csv读取器将其视为数据的一部分。它不是行终止符(应被称为行分隔符)或其一部分。要更好地了解引用,请删除 quoting = csv.QUOTE_NONNUMERIC



\r\\\
是因为csv使用 dialect.lineterminator 终止行,其默认值为 \ r \\\
。换句话说,忽略通用换行符设置。



更新



io.StringIO 的2.7和3.2文档对于 newline arg几乎是相同的。


新行参数的工作原理类似于TextIOWrapper。默认值为
,不进行换行。


我们将检查下面的第一句。第二句话对于输出是正确的,取决于您对默认和新行翻译的解释。



TextIOWrapper docs:


换行符可以是None,'','\\\
','\r'或'\r\\\
'。它控制
处理行尾。如果为None,则通用换行符是
。启用此选项后,在输入时,行结束'\\\
','\r'或
'\r\\\
'在返回给调用者之前转换为'\\\
'。
相反,在输出上,'\\\
'被转换为系统默认行
separator,os.linesep。如果换行符是其任何其他合法值,
换行符在文件读取时变为换行符,
返回未翻译。在输出中,'\\\
'转换为换行符。


Windows XP上的Python 3.2:

 >>>从io导入StringIO as S 
>>> import os
>>>> print(repr(os.linesep))
'\r\\\
'
>>> (None,'','\\\
','\r','\r\\\
')]
的ss = [S()] + [S(newline = nl) >>>> for x,s in enumerate(ss):
... m = s.write('foo\\\
bar\rzot\r\\\
')
... v = s。 getvalue()
... print(x,m,len(v),repr(v))
...
0 13 13'foo\\\
bar\rzot r \\\
'
1 13 12'foo\\\
bar\\\
zot\\\
'
2 13 13'foo\\\
bar\rzot\r\\\
'
3 13 13'foo\\\
bar\rzot\r\\\
'
4 13 13'foo\rbar\rzot\r\r'
5 13 15'foo\\ \\ r\\\
bar\rzot\r\r\\\
'
>>>

第0行显示default,你得到没有 newline arg不涉及 \\\
(或任何其他字符)的翻译。 当然不会将'\\\
'
转换为 os.linesep



第1行显示你使用 newline = None (应该与第0行相同,不是吗?



第2行: newline =''通用换行符

code>没有改变,像第0行。它肯定不会将'\\\
'
转换为''



第3行,第4行和第5行:如文档所述,'\\\
'
转换为 newline arg的值。



等效的Python 2.X代码与Python 2.7产生相同的结果.2。



更新2 为了与内置的 open()默认值应为 os.linesep ,如记录所示。要获得输出的无翻译行为,请使用 newline =''。注意: open()文档更清楚。我将于明天提交错误报告。


Consider the following (Python 3.2 under Windows):

>>> import io
>>> import csv
>>> output = io.StringIO()         # default parameter newline=None
>>> csvdata = [1, 'a', 'Whoa!\nNewlines!']
>>> writer = csv.writer(output, quoting=csv.QUOTE_NONNUMERIC)
>>> writer.writerow(csvdata)
25
>>> output.getvalue()
'1,"a","Whoa!\nNewlines!"\r\n'

Why is there a single \n - shouldn't it have been converted to \r\n since universal newlines mode is enabled?

With this enabled, on input, the lines endings \n, \r, or \r\n are translated to \n before being returned to the caller. Conversely, on output, \n is translated to the system default line separator, os.linesep.

解决方案

The "single" \n occurs as a data character inside the third field. Consequently that field is quoted so that a csv reader will treat it as part of the data. It is NOT a "line terminator" (should be called a row separator) or part thereof. To get a better appreciation of the quoting, remove the quoting=csv.QUOTE_NONNUMERIC.

The \r\n is produced because csv terminates rows with the dialect.lineterminator whose default is \r\n. In other words, the "universal newlines" setting is ignored.

Update

The 2.7 and 3.2 docs for io.StringIO are virtually identical as far as the newline arg is concerned.

The newline argument works like that of TextIOWrapper. The default is to do no newline translation.

We'll examine the first sentence below. The second sentence is true for output, depending on your interpretation of "default" and "newline translation".

TextIOWrapper docs:

newline can be None, '', '\n', '\r', or '\r\n'. It controls the handling of line endings. If it is None, universal newlines is enabled. With this enabled, on input, the lines endings '\n', '\r', or '\r\n' are translated to '\n' before being returned to the caller. Conversely, on output, '\n' is translated to the system default line separator, os.linesep. If newline is any other of its legal values, that newline becomes the newline when the file is read and it is returned untranslated. On output, '\n' is converted to the newline.

Python 3.2 on Windows:

>>> from io import StringIO as S
>>> import os
>>> print(repr(os.linesep))
'\r\n'
>>> ss = [S()] + [S(newline=nl) for nl in (None, '', '\n', '\r', '\r\n')]
>>> for x, s in enumerate(ss):
...     m = s.write('foo\nbar\rzot\r\n')
...     v = s.getvalue()
...     print(x, m, len(v), repr(v))
...
0 13 13 'foo\nbar\rzot\r\n'
1 13 12 'foo\nbar\nzot\n'
2 13 13 'foo\nbar\rzot\r\n'
3 13 13 'foo\nbar\rzot\r\n'
4 13 13 'foo\rbar\rzot\r\r'
5 13 15 'foo\r\nbar\rzot\r\r\n'
>>>

Line 0 shows that the "default" that you get with no newline arg involves no translation of \n (or any other character). It is certainly NOT converting '\n' to os.linesep

Line 1 shows that what you get with newline=None (should be the same as line 0, shouldn't it??) is in effect INPUT universal newlines translation -- bizarre!

Line 2: newline='' does no change, like line 0. It is certainly NOT converting '\n' to ''.

Lines 3, 4, and 5: as the docs say, '\n' is converted to the value of the newline arg.

The equivalent Python 2.X code produces equivalent results with Python 2.7.2.

Update 2 For consistency with built-in open(), the default should be os.linesep, as documented. To get the no-translation-on-output behaviour, use newline=''. Note: the open() docs are much clearer. I'll submit a bug report tomorrow.

这篇关于使用StringIO和csv模块的通用换行模式的意外行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆