在Python中读取SUB以外的行 [英] Reading lines beyond SUB in Python

查看:85
本文介绍了在Python中读取SUB以外的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

新手问题.在Python 2.7.2中,我在读取文本文件时遇到了问题,该文本文件似乎偶然包含一些控制字符.具体来说,循环

Newbie question. In Python 2.7.2., I have a problem reading text files which accidentally seem to contain some control characters. Specifically, the loop

for line in f

遇到包含SUB字符(ascii十六进制代码1a)的行,它将立即停止而没有任何警告或错误.使用f.readlines()时,结果是相同的.本质上,就Python而言,一旦遇到第一个SUB字符,文件就会完成,而分配给line的最后一个值是该字符的行.

will cease without any warning or error as soon as it comes across a line containing the SUB character (ascii hex code 1a). When using f.readlines() the result is the same. Essentially, as far as Python is concerned, the file is finished as soon as the first SUB character is encountered, and the last value assigned line is the line up to that character.

有没有一种方法可以读取此类字符和/或在遇到一个字符时发出警告?

Is there a way to read beyond such a character and/or to issue a warning when encountering one?

推荐答案

在Windows系统上,0x1a是文件结尾字符.您需要以二进制模式打开文件才能通过它:

On Windows systems 0x1a is the End-of-File character. You'll need to open the file in binary mode in order to get past it:

f = open(filename, 'rb')

缺点是,您将失去面向行的本质,必须自己拆分行:

The downside is you will lose the line-oriented nature and have to split the lines yourself:

lines = f.read().split('\r\n')  # assuming Windows line endings

这篇关于在Python中读取SUB以外的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆