在r和rb模式下解析文本文件的区别 [英] Difference between parsing a text file in r and rb mode
问题描述
特别是当有问题的文本文件可能包含非ASCII字符时。
这取决于一些什么你正在使用的Python版本。在Python 2中, Chris Drappier的答案适用。在Python 3中,它是一个不同的(也是更一致的)故事:在文本模式下('r'
) ,Python将根据你给的文本编码解析文件(或者,如果你不给,取决于平台相关的默认值),并且 read()
将会给你一个 str
。在二进制('rb'
)模式下,Python不会假定文件包含可合理解析为字符的内容,而 read()
给你一个字节
对象。另外,在Python 3中,通用换行符('\\\
和特定于平台的转换新行约定,所以你不必关心他们)可用于任何平台上的文本模式文件,而不仅仅是Windows。
'
What makes parsing a text file in 'r' mode more convenient than parsing it in 'rb' mode? Especially when the text file in question may contain non-ASCII characters.
This depends a little bit on what version of Python you're using. In Python 2, Chris Drappier's answer applies.
In Python 3, its a different (and more consistent) story: in text mode ('r'
), Python will parse the file according to the text encoding you give it (or, if you don't give one, a platform-dependent default), and read()
will give you a str
. In binary ('rb'
) mode, Python does not assume that the file contains things that can reasonably be parsed as characters, and read()
gives you a bytes
object.
Also, in Python 3, the universal newlines (the translating between '\n'
and platform-specific newline conventions so you don't have to care about them) is available for text-mode files on any platform, not just Windows.
这篇关于在r和rb模式下解析文本文件的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!