在Python中读取SUB以外的行 [英] Reading lines beyond SUB in Python
问题描述
新手问题.在Python 2.7.2中,我在读取文本文件时遇到了问题,该文本文件似乎偶然包含一些控制字符.具体来说,循环
Newbie question. In Python 2.7.2., I have a problem reading text files which accidentally seem to contain some control characters. Specifically, the loop
for line in f
遇到包含SUB
字符(ascii十六进制代码1a)的行,它将立即停止而没有任何警告或错误.使用f.readlines()
时,结果是相同的.本质上,就Python而言,一旦遇到第一个SUB
字符,文件就会完成,而分配给line
的最后一个值是该字符的行.
will cease without any warning or error as soon as it comes across a line containing the SUB
character (ascii hex code 1a). When using f.readlines()
the result is the same. Essentially, as far as Python is concerned, the file is finished as soon as the first SUB
character is encountered, and the last value assigned line
is the line up to that character.
有没有一种方法可以读取此类字符和/或在遇到一个字符时发出警告?
Is there a way to read beyond such a character and/or to issue a warning when encountering one?
推荐答案
在Windows系统上,0x1a
是文件结尾字符.您需要以二进制模式打开文件才能通过它:
On Windows systems 0x1a
is the End-of-File character. You'll need to open the file in binary mode in order to get past it:
f = open(filename, 'rb')
缺点是,您将失去面向行的本质,必须自己拆分行:
The downside is you will lose the line-oriented nature and have to split the lines yourself:
lines = f.read().split('\r\n') # assuming Windows line endings
这篇关于在Python中读取SUB以外的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!