使用平台相关的换行符打印到UTF-8编码的文件吗? [英] Print to UTF-8 encoded file, with platform-dependent newlines?

查看:127
本文介绍了使用平台相关的换行符打印到UTF-8编码的文件吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Python中,用平台相关的换行符写入UTF-8编码文件的最佳方法是什么?理想情况下,该解决方案可以在使用Python 2进行大量打印的程序中透明地工作.(也欢迎提供有关Python 3的信息!)

In Python, what is the best way to write to a UTF-8 encoded file with platform-dependent newlines? the solution would ideally work quite transparently in a program that does a lot of printing in Python 2. (Information about Python 3 is welcome too!)

实际上,写入UTF-8文件的标准方法似乎是 codecs.open( 'name.txt','w').但是,文档表明

In fact, the standard way of writing to a UTF-8 file seems to be codecs.open('name.txt', 'w'). However, the documentation indicates that

(…)在读写时不会自动转换'\ n'.

(…) no automatic conversion of '\n' is done on reading and writing.

因为该文件实际上是以二进制模式打开的.那么,如何使用适当的依赖于平台的换行符来写入UTF-8文件?

because the file is actually opened in binary mode. So, how to write to a UTF-8 file with proper platform-dependent newlines?

注意:"t"模式似乎在Windows XP上使用Python 2.6确实可以完成工作(codecs.open('name.txt','wt')),但这是否已得到记录并保证可以正常工作?

Note: The 't' mode seems to actually do the job (codecs.open('name.txt', 'wt')) with Python 2.6 on Windows XP, but is this documented and guaranteed to work?

推荐答案

假定Python 2.7.1(即您引用的文档):没有记录"wt"模式(记录的唯一模式是"r"),并且不起作用-编解码器模块将'b'附加到模式下,导致其失败:

Presuming Python 2.7.1 (that's the docs that you quoted): The 'wt' mode is not documented (the ONLY mode documented is 'r'), and does not work -- the codecs module appends 'b' to the mode, which causes it to fail:

>>> f = codecs.open('bar.txt', 'wt', encoding='utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\python27\lib\codecs.py", line 881, in open
    file = __builtin__.open(filename, mode, buffering)
ValueError: Invalid mode ('wtb')

避免编解码器模块和DIY:

Avoid the codecs module and DIY:

f = open('bar.text', 'w')
f.write(unicode_object.encode('utf8'))

更新有关Python 3.x:

Update about Python 3.x:

似乎codecs.open()具有相同的缺陷(不会编写特定于平台的行终止符).但是内置有encoding arg的open()很高兴做到这一点:

It appears the codecs.open() has the same deficiency (won't write platform-specific line terminator). However built-in open(), which has an encoding arg, is happy to do it:

[Python 3.2 on Windows 7 Pro]
>>> import codecs
>>> f = codecs.open('bar.txt', 'w', encoding='utf8')
>>> f.write('line1\nline2\n')
>>> f.close()
>>> open('bar.txt', 'rb').read()
b'line1\nline2\n'
>>> f = open('bar.txt', 'w', encoding='utf8')
>>> f.write('line1\nline2\n')
12
>>> f.close()
>>> open('bar.txt', 'rb').read()
b'line1\r\nline2\r\n'
>>>

更新有关Python 2.6

Update about Python 2.6

文档说的与2.7文档相同.区别在于,在2.6中将"b"附加到模式arg的大头钉进入二进制模式" hack在2.6中失败,因为未将"wtb"检测为无效模式,该文件以文本模式打开,并且似乎可以正常工作如您所愿,而不是文档记录:

The docs say the same as the 2.7 docs. The difference is that the "bludgeon into binary mode" hack of appending "b" to the mode arg failed in 2.6 because "wtb" wasn't detected as as an invalid mode, the file was opened in text mode, and appears to work as you wanted, not as documented:

>>> import codecs
>>> f = codecs.open('fubar.txt', 'wt', encoding='utf8')
>>> f.write(u'\u0a0aline1\n\xffline2\n')
>>> f.close()
>>> open('fubar.txt', 'rb').read()
'\xe0\xa8\x8aline1\r\n\xc3\xbfline2\r\n' # "works"
>>> f.mode
'wtb' # oops
>>>

这篇关于使用平台相关的换行符打印到UTF-8编码的文件吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆