在 Windows 上防止 Python print() 自动换行转换为 CRLF [英] Prevent Python print()'s automatic newline conversion to CRLF on Windows

查看:74
本文介绍了在 Windows 上防止 Python print() 自动换行转换为 CRLF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过 Windows CMD(控制台)从 Python 使用类 Unix EOL (LF) 传输文本.但是,Python 似乎会自动将单个换行符转换为 Windows 样式的 行尾 (EOL) 字符(即 \r\n0D 0A13 10>):

#!python3#coding=utf-8导入系统打印(系统版本)打印(一个\n两个")# 作为 py t.py 运行 >t.txt

结果

3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 位 (AMD64)]一二

或十六进制 ... 6F 6E 65 0D 0A 74 77 6F 0D 0A

第二次 EOL 是因为print() 默认为 end='\n',但它也进行转换.

没有像 open() 那样用于打印的 newline 参数或属性,那么如何控制?

解决方案

看到这个答案: https://stackoverflow.com/a/34997357/1619432


print() 通常写入 sys.stdout.以下是非交互模式的文档摘录:

  • stdout 用于 print() 的输出

  • sys.stdout:解释器用于标准...输出的文件对象

  • 这些流是常规文本文件,类似于 open() 函数返回的那些.

  • Windows 上的字符编码是 ANSI

  • 标准流是......像普通文本文件一样进行块缓冲.

  • 注意
    要从/向标准流写入或读取二进制数据,请使用底层二进制缓冲区对象.例如,要将字节写入标准输出,使用 sys.stdout.buffer.write(b'abc').

让我们先试试这种直接的方法:

导入系统打印(一个\n两个")sys.stdout.write('三\n四')sys.stdout.buffer.write(b'五\nsix')

结果

五个\n六个人\r\n两个\r\n三\r\n四

缓冲区写入似乎按预期工作,尽管它与输出顺序混乱".

在直接写入缓冲区之前刷新有助于:

导入系统打印(一个\n两个")sys.stdout.write('三\n四')sys.stdout.flush()sys.stdout.buffer.write(b'五\nsix')

结果

一个\r\n两个\r\n三\r\n四五\n六

但它仍然不是修复"print().回到文件对象/流/文本文件(关于 IO 对象的简短信息在 Python 数据模型中):

https://docs.python.org/3/glossary.html#term-文本文件

<块引用>

一个能够读写 str 对象的文件对象.通常,文本文件实际上访问面向字节的数据流并自动处理文本编码.文本文件的示例包括以文本模式('r' 或 'w')、sys.stdin、sys.stdout 和 io.StringIO 实例打开的文件.

那么(如何)可以重新配置或重新打开 sys.stdout 文件 以控制换行行为?它到底是什么?

<预><代码>>>>导入系统>>>类型(sys.stdout)<类'_io.TextIOWrapper'>

文档:class io.TextIOWrapper(buffer, encoding=None,errors=None, newline=None, line_buffering=False, write_through=False):

<块引用>

newline 控制如何处理行尾.它可以是 None, '','\n'、'\r' 和 '\r\n'.
它的工作原理如下:
从流中读取输入时,如果换行符为 None,则启用通用换行符模式.输入中的行可以以 '\n'、'\r' 或 '\r\n' 结尾,这些在返回给调用者之前会被转换为 '\n'.
如果是 '',则启用通用换行符模式,但行尾将返回给调用者未翻译.
如果它具有任何其他合法值,则输入行仅以给定的字符串终止,并且行尾未翻译地返回给调用者.

写入输出到流时,如果换行符为None,则写入的任何'\n'字符转换为系统默认行分隔符os.行.
如果换行符是 '' 或 '\n',则不进行翻译.
如果换行符是任何其他合法值,则写入的任何 '\n' 字符都将转换为给定的字符串.

让我们看看:

<预><代码>>>>sys.stdout.newline = "\n">>>

好的,然后呢

导入系统sys.stdout.newline = '\n'打印(一个\n两个")

不起作用:

一个\r\n两个\r\n

因为该属性不存在:

<预><代码>>>>sys.stdout.newline回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中AttributeError: '_io.TextIOWrapper' 对象没有属性 'newline'

我应该早点检查..

<预><代码>>>>变量(sys.stdout){'模式':'w'}

真的,没有 newline 属性让我们重新定义.

有什么有用的方法吗?

<预><代码>>>>目录(sys.stdout)['_CHUNK_SIZE', '__class__', '__del__', '__delattr__', '__dict__','__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__','__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__','__init__', '__init_subclass__', '__iter__', '__le__', '__lt__','__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__','__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__','_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable','_finalizing', 'buffer', 'close', 'closed', 'detach', 'encoding','错误','fileno','flush','isatty','line_buffering','mode','name', 'newlines', 'read', 'readable', 'readline', 'readlines','seek', 'seekable', 'tell', 'truncate', 'writable', 'write','写线']

不是真的.

但我们至少可以替换默认接口到缓冲区结束指定所需的换行符:

import sys, iosys.stdout = io.TextIOWrapper(sys.stdout.buffer, newline='\n' )打印(一个\n两个")

最终导致

一个\n两个\n

要恢复,只需将 sys.stdout 重新分配给您制作的副本.或者,显然不推荐,使用内部保存的 sys.__stdout__ 做到这一点.

警告:请参阅eryksun 的评论 在下面,这需要一些小心.改用他的解决方案(下面的链接):


似乎也可以重新打开文件,请参阅 Wrap an使用 io.TextIOWrapper 打开流以获得灵感,这个答案 https://stackoverflow.com/a/34997357/1619432 用于实现.


如果您想仔细查看,请查看 Python (CPython) 源代码:https://github.com/python/cpython/blob/master/Modules/_io/textio.c


还有os.linesep,让我们看看对于 Windows,它真的是\r\n":

<预><代码>>>>导入操作系统>>>os.linesep'\r\n'>>>",".join([f'0x{ord(c):X}' for c in os.linesep])'0xD,0xA'

这可以重新定义吗?

#!python3#coding=utf-8导入系统,操作系统保存 = os.linesepos.linesep = '\n'打印(os.linesep)打印(一个\n两个")os.linesep = 保存

它可以在交互模式下,但显然不是:

<代码>\r\n\r\n一个\r\n两个\r\n


I'd like to pipe text with unix-like EOL (LF) from Python via Windows CMD (console). However, Python seems to automatically convert single newlines into Windows-style end-of-line (EOL) characters (i.e. \r\n, <CR><LF>, 0D 0A, 13 10):

#!python3
#coding=utf-8
import sys
print(sys.version)
print("one\ntwo")
# run as py t.py > t.txt

results in

3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)]
one
two

or in hexadecimal ... 6F 6E 65 0D 0A 74 77 6F 0D 0A

The second EOL is because print() defaults to end='\n', but it also does the conversion.

There is no newline argument or property for print like there is for open(), so how can this be controlled?

解决方案

See this answer: https://stackoverflow.com/a/34997357/1619432


print() usually writes to sys.stdout. The following are excerpts of the documentation, for non-interactive mode:

  • stdout is used for the output of print()

  • sys.stdout: File object used by the interpreter for standard ... output

  • These streams are regular text files like those returned by the open() function.

  • character encoding on Windows is ANSI

  • standard streams are ... block-buffered like regular text files.

  • Note
    To write or read binary data from/to the standard streams, use the underlying binary buffer object. For example, to write bytes to stdout, use sys.stdout.buffer.write(b'abc').

Let's try this direct approach first:

import sys
print("one\ntwo")
sys.stdout.write('three\nfour')
sys.stdout.buffer.write(b'five\nsix')

results in

five\n
sixone\r\n
two\r\n
three\r\n
four

The buffer write seems to work as desired, although it's "messing" with the output order.

Flushing before writing to the buffer directly helps:

import sys
print("one\ntwo")
sys.stdout.write('three\nfour')
sys.stdout.flush()
sys.stdout.buffer.write(b'five\nsix')

results in

one\r\n
two\r\n
three\r\n
fourfive\n
six

But still it's not "fixing" print(). Back to the file objects / streams / text files (short info on IO objects in Python Data model):

https://docs.python.org/3/glossary.html#term-text-file

A file object able to read and write str objects. Often, a text file actually accesses a byte-oriented datastream and handles the text encoding automatically. Examples of text files are files opened in text mode ('r' or 'w'), sys.stdin, sys.stdout, and instances of io.StringIO.

So (how) can the sys.stdout file be reconfigured or reopened to control the newline behaviour? And what exactly is it?

>>> import sys
>>> type(sys.stdout)
<class '_io.TextIOWrapper'>

Docs: class io.TextIOWrapper(buffer, encoding=None, errors=None, newline=None, line_buffering=False, write_through=False):

newline controls how line endings are handled. It can be None, '', '\n', '\r', and '\r\n'.
It works as follows:
When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller.
If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated.
If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.

When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep.
If newline is '' or '\n', no translation takes place.
If newline is any of the other legal values, any '\n' characters written are translated to the given string.

Let's see:

>>> sys.stdout.newline = "\n"
>>>

OK, and what about

import sys
sys.stdout.newline = '\n'
print("one\ntwo")

Does not work:

one\r\n
two\r\n

because the property does not exist:

>>> sys.stdout.newline
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: '_io.TextIOWrapper' object has no attribute 'newline'

Which I should have checked earlier ..

>>> vars(sys.stdout)
{'mode': 'w'}

So really, there's no newline attribute for us to redefine.

Any useful methods?

>>> dir(sys.stdout)
['_CHUNK_SIZE', '__class__', '__del__', '__delattr__', '__dict__', 
'__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', 
'__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', 
'__init__', '__init_subclass__', '__iter__', '__le__', '__lt__',
'__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', 
'__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 
'_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', 
'_finalizing', 'buffer', 'close', 'closed', 'detach', 'encoding', 
'errors', 'fileno', 'flush', 'isatty', 'line_buffering', 'mode', 
'name', 'newlines', 'read', 'readable', 'readline', 'readlines',
'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 
'writelines']

Not really.

But we can at least replace the default interface to the buffer end specify the required newline character(s):

import sys, io
sys.stdout = io.TextIOWrapper(sys.stdout.buffer, newline='\n' )
print("one\ntwo")

which finally results in

one\n
two\n

To restore, just reassign sys.stdout to a copy you made. Or, apparently not recommended, use the internally kept sys.__stdout__ to do that.

Warning: See eryksun's comment below, this requires some care. Use his solution instead (link below):


It seems it might also be possible to reopen the file, see Wrap an open stream with io.TextIOWrapper for inspiration, and this answer https://stackoverflow.com/a/34997357/1619432 for the implementation.


If you want to take a closer look, check out the Python (CPython) sources: https://github.com/python/cpython/blob/master/Modules/_io/textio.c


There's also os.linesep, let's see if it's really "\r\n" for Windows:

>>> import os
>>> os.linesep
'\r\n'
>>> ",".join([f'0x{ord(c):X}' for c in os.linesep])
'0xD,0xA'

Could this be redefined?

#!python3
#coding=utf-8
import sys, os
saved = os.linesep
os.linesep = '\n'
print(os.linesep)
print("one\ntwo")
os.linesep = saved

It can in the interactive mode, but apparently not otherwise:

\r\n
\r\n
one\r\n
two\r\n


这篇关于在 Windows 上防止 Python print() 自动换行转换为 CRLF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆